This commit implements a complete test infrastructure for validating hot-reload stability and robustness across multiple scenarios. ## New Test Infrastructure ### Test Helpers (tests/helpers/) - TestMetrics: FPS, memory, reload time tracking with statistics - TestReporter: Assertion tracking and formatted test reports - SystemUtils: Memory usage monitoring via /proc/self/status - TestAssertions: Macro-based assertion framework ### Test Modules - TankModule: Realistic module with 50 tanks for production testing - ChaosModule: Crash-injection module for robustness validation - StressModule: Lightweight module for long-duration stability tests ## Integration Test Scenarios ### Scenario 1: Production Hot-Reload (test_01_production_hotreload.cpp) ✅ PASSED - End-to-end hot-reload validation - 30 seconds simulation (1800 frames @ 60 FPS) - TankModule with 50 tanks, realistic state - Source modification (v1.0 → v2.0), recompilation, reload - State preservation: positions, velocities, frameCount - Metrics: ~163ms reload time, 0.88MB memory growth ### Scenario 2: Chaos Monkey (test_02_chaos_monkey.cpp) ✅ PASSED - Extreme robustness testing - 150+ random crashes per run (5% crash probability per frame) - 5 crash types: runtime_error, logic_error, out_of_range, domain_error, state corruption - 100% recovery rate via automatic hot-reload - Corrupted state detection and rejection - Random seed for unpredictable crash patterns - Proof of real reload: temporary files in /tmp/grove_module_*.so ### Scenario 3: Stress Test (test_03_stress_test.cpp) ✅ PASSED - Long-duration stability validation - 10 minutes simulation (36000 frames @ 60 FPS) - 120 hot-reloads (every 5 seconds) - 100% reload success rate (120/120) - Memory growth: 2 MB (threshold: 50 MB) - Avg reload time: 160ms (threshold: 500ms) - No memory leaks, no file descriptor leaks ## Core Engine Enhancements ### ModuleLoader (src/ModuleLoader.cpp) - Temporary file copy to /tmp/ for Linux dlopen cache bypass - Robust reload() method: getState() → unload() → load() → setState() - Automatic cleanup of temporary files - Comprehensive error handling and logging ### DebugEngine (src/DebugEngine.cpp) - Automatic recovery in processModuleSystems() - Exception catching → logging → module reload → continue - Module state dump utilities for debugging ### SequentialModuleSystem (src/SequentialModuleSystem.cpp) - extractModule() for safe module extraction - registerModule() for module re-registration - Enhanced processModules() with error handling ## Build System - CMake configuration for test infrastructure - Shared library compilation for test modules (.so) - CTest integration for all scenarios - PIC flag management for spdlog compatibility ## Documentation (planTI/) - Complete test architecture documentation - Detailed scenario specifications with success criteria - Global test plan and validation thresholds ## Validation Results All 3 integration scenarios pass successfully: - Production hot-reload: State preservation validated - Chaos Monkey: 100% recovery from 150+ crashes - Stress Test: Stable over 120 reloads, minimal memory growth 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
17 KiB
17 KiB
Scénario 4: Race Condition Hunter
Priorité: ⭐⭐ IMPORTANT Phase: 2 (SHOULD HAVE) Durée estimée: ~10 minutes (1000 compilations) Effort implémentation: ~6-8 heures
🎯 Objectif
Détecter et valider la robustesse face aux race conditions lors de la compilation concurrente:
- FileWatcher détecte changements pendant compilation
- File stability check fonctionne
- Aucun .so corrompu chargé
- Aucun deadlock entre threads
- 100% success rate des reloads
C'est le test qui a motivé le fix de la race condition initiale !
📋 Description
Setup
- Thread 1 (Compiler): Recompile
TestModule.sotoutes les 300ms - Thread 2 (FileWatcher): Détecte changements et déclenche reload
- Thread 3 (Engine): Exécute
process()en tight loop à 60 FPS - Durée: 1000 cycles de compilation (~5 minutes)
Comportements à Tester
- File stability check: Attend que le fichier soit stable avant reload
- Size verification: Vérifie que le .so copié est complet
- Concurrent access: Pas de corruption pendant dlopen/dlclose
- Error handling: Détecte et récupère des .so incomplets
🏗️ Implémentation
AutoCompiler Helper
// helpers/AutoCompiler.h
class AutoCompiler {
public:
AutoCompiler(const std::string& moduleName, const std::string& buildDir)
: moduleName(moduleName), buildDir(buildDir), isRunning(false) {}
void start(int iterations, int intervalMs) {
isRunning = true;
compilerThread = std::thread([this, iterations, intervalMs]() {
for (int i = 0; i < iterations && isRunning; i++) {
compile(i);
std::this_thread::sleep_for(std::chrono::milliseconds(intervalMs));
}
});
}
void stop() {
isRunning = false;
if (compilerThread.joinable()) {
compilerThread.join();
}
}
int getSuccessCount() const { return successCount; }
int getFailureCount() const { return failureCount; }
int getCurrentIteration() const { return currentIteration; }
private:
std::string moduleName;
std::string buildDir;
std::atomic<bool> isRunning;
std::atomic<int> successCount{0};
std::atomic<int> failureCount{0};
std::atomic<int> currentIteration{0};
std::thread compilerThread;
void compile(int iteration) {
currentIteration = iteration;
// Modifier source pour forcer recompilation
modifySourceVersion(iteration);
// Compiler
std::string cmd = "cmake --build " + buildDir + " --target " + moduleName + " 2>&1";
int result = system(cmd.c_str());
if (result == 0) {
successCount++;
} else {
failureCount++;
std::cerr << "Compilation failed at iteration " << iteration << "\n";
}
}
void modifySourceVersion(int iteration) {
// Modifier TestModule.cpp pour changer version
std::string sourcePath = buildDir + "/../tests/modules/TestModule.cpp";
std::ifstream input(sourcePath);
std::string content((std::istreambuf_iterator<char>(input)), std::istreambuf_iterator<char>());
input.close();
// Remplacer version
std::regex versionRegex(R"(moduleVersion = "v[0-9]+")");
std::string newVersion = "moduleVersion = \"v" + std::to_string(iteration) + "\"";
content = std::regex_replace(content, versionRegex, newVersion);
std::ofstream output(sourcePath);
output << content;
}
};
Test Principal
// test_04_race_condition.cpp
#include "helpers/AutoCompiler.h"
#include "helpers/TestMetrics.h"
#include "helpers/TestReporter.h"
#include <atomic>
#include <thread>
int main() {
TestReporter reporter("Race Condition Hunter");
TestMetrics metrics;
const int TOTAL_COMPILATIONS = 1000;
const int COMPILE_INTERVAL_MS = 300;
std::cout << "================================================================================\n";
std::cout << "RACE CONDITION HUNTER: " << TOTAL_COMPILATIONS << " compilations\n";
std::cout << "================================================================================\n\n";
// === SETUP ===
DebugEngine engine;
engine.loadModule("TestModule", "build/modules/libTestModule.so");
auto config = createJsonConfig({{"version", "v0"}});
engine.initializeModule("TestModule", config);
// === STATISTIQUES ===
std::atomic<int> reloadAttempts{0};
std::atomic<int> reloadSuccesses{0};
std::atomic<int> reloadFailures{0};
std::atomic<int> corruptedLoads{0};
std::atomic<int> crashes{0};
std::atomic<bool> engineRunning{true};
// === THREAD 1: Auto-Compiler ===
std::cout << "Starting auto-compiler (300ms interval)...\n";
AutoCompiler compiler("TestModule", "build");
compiler.start(TOTAL_COMPILATIONS, COMPILE_INTERVAL_MS);
// === THREAD 2: FileWatcher + Reload ===
std::cout << "Starting FileWatcher...\n";
std::thread watcherThread([&]() {
std::string soPath = "build/modules/libTestModule.so";
std::filesystem::file_time_type lastWriteTime;
try {
lastWriteTime = std::filesystem::last_write_time(soPath);
} catch (...) {
std::cerr << "Failed to get initial file time\n";
return;
}
while (engineRunning && compiler.getCurrentIteration() < TOTAL_COMPILATIONS) {
try {
auto currentWriteTime = std::filesystem::last_write_time(soPath);
if (currentWriteTime != lastWriteTime) {
// FICHIER MODIFIÉ - RELOAD
reloadAttempts++;
std::cout << "[Compilation #" << compiler.getCurrentIteration()
<< "] File changed, triggering reload...\n";
auto reloadStart = std::chrono::high_resolution_clock::now();
try {
// Le ModuleLoader va attendre file stability
engine.reloadModule("TestModule");
auto reloadEnd = std::chrono::high_resolution_clock::now();
float reloadTime = std::chrono::duration<float, std::milli>(reloadEnd - reloadStart).count();
metrics.recordReloadTime(reloadTime);
reloadSuccesses++;
// Vérifier que le module est valide
auto state = engine.getModuleState("TestModule");
auto* jsonNode = dynamic_cast<JsonDataNode*>(state.get());
const auto& stateJson = jsonNode->getJsonData();
std::string version = stateJson["version"];
std::cout << " → Reload OK (" << reloadTime << "ms), version: " << version << "\n";
} catch (const std::exception& e) {
reloadFailures++;
std::cerr << " → Reload FAILED: " << e.what() << "\n";
// Vérifier si c'est un .so corrompu
if (std::string(e.what()).find("Incomplete") != std::string::npos ||
std::string(e.what()).find("dlopen") != std::string::npos) {
corruptedLoads++;
}
}
lastWriteTime = currentWriteTime;
}
} catch (const std::filesystem::filesystem_error& e) {
// Fichier en cours d'écriture, ignore
}
std::this_thread::sleep_for(std::chrono::milliseconds(50));
}
});
// === THREAD 3: Engine Loop ===
std::cout << "Starting engine loop (60 FPS)...\n";
std::thread engineThread([&]() {
int frame = 0;
while (engineRunning && compiler.getCurrentIteration() < TOTAL_COMPILATIONS) {
auto frameStart = std::chrono::high_resolution_clock::now();
try {
engine.update(1.0f / 60.0f);
// Métriques
auto frameEnd = std::chrono::high_resolution_clock::now();
float frameTime = std::chrono::duration<float, std::milli>(frameEnd - frameStart).count();
metrics.recordFPS(1000.0f / frameTime);
if (frame % 60 == 0) {
metrics.recordMemoryUsage(getCurrentMemoryUsage());
}
} catch (const std::exception& e) {
crashes++;
std::cerr << "[Frame " << frame << "] ENGINE CRASH: " << e.what() << "\n";
// Continue malgré le crash (test robustesse)
}
frame++;
// Sleep pour maintenir 60 FPS
auto frameEnd = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration<float, std::milli>(frameEnd - frameStart).count();
int sleepMs = std::max(0, static_cast<int>(16.67f - elapsed));
std::this_thread::sleep_for(std::chrono::milliseconds(sleepMs));
}
});
// === ATTENDRE FIN ===
std::cout << "\nRunning test...\n";
// Progress monitoring
while (compiler.getCurrentIteration() < TOTAL_COMPILATIONS) {
std::this_thread::sleep_for(std::chrono::seconds(10));
int progress = (compiler.getCurrentIteration() * 100) / TOTAL_COMPILATIONS;
std::cout << "Progress: " << progress << "% ("
<< compiler.getCurrentIteration() << "/" << TOTAL_COMPILATIONS << " compilations)\n";
std::cout << " Reloads: " << reloadSuccesses << " OK, " << reloadFailures << " FAIL\n";
std::cout << " Corrupted loads: " << corruptedLoads << "\n";
std::cout << " Crashes: " << crashes << "\n\n";
}
// Stop tous les threads
engineRunning = false;
compiler.stop();
watcherThread.join();
engineThread.join();
std::cout << "\nAll threads stopped.\n\n";
// === VÉRIFICATIONS FINALES ===
int compileSuccesses = compiler.getSuccessCount();
int compileFailures = compiler.getFailureCount();
float compileSuccessRate = (compileSuccesses * 100.0f) / TOTAL_COMPILATIONS;
float reloadSuccessRate = (reloadAttempts > 0) ? (reloadSuccesses * 100.0f / reloadAttempts) : 100.0f;
// Assertions
ASSERT_GT(compileSuccessRate, 95.0f, "Compile success rate should be > 95%");
reporter.addMetric("compile_success_rate_percent", compileSuccessRate);
ASSERT_EQ(corruptedLoads, 0, "Should have 0 corrupted loads (file stability check should prevent this)");
reporter.addMetric("corrupted_loads", corruptedLoads);
ASSERT_EQ(crashes, 0, "Should have 0 crashes");
reporter.addMetric("crashes", crashes);
// Si on a des reloads, vérifier le success rate
if (reloadAttempts > 0) {
ASSERT_GT(reloadSuccessRate, 99.0f, "Reload success rate should be > 99%");
}
reporter.addMetric("reload_success_rate_percent", reloadSuccessRate);
// Vérifier que file stability check a fonctionné (temps moyen > 0)
float avgReloadTime = metrics.getReloadTimeAvg();
ASSERT_GT(avgReloadTime, 100.0f, "Avg reload time should be > 100ms (file stability wait)");
reporter.addMetric("reload_time_avg_ms", avgReloadTime);
reporter.addMetric("total_compilations", TOTAL_COMPILATIONS);
reporter.addMetric("compile_successes", compileSuccesses);
reporter.addMetric("compile_failures", compileFailures);
reporter.addMetric("reload_attempts", static_cast<int>(reloadAttempts));
reporter.addMetric("reload_successes", static_cast<int>(reloadSuccesses));
reporter.addMetric("reload_failures", static_cast<int>(reloadFailures));
// === RAPPORT FINAL ===
std::cout << "================================================================================\n";
std::cout << "RACE CONDITION HUNTER SUMMARY\n";
std::cout << "================================================================================\n";
std::cout << "Compilations:\n";
std::cout << " Total: " << TOTAL_COMPILATIONS << "\n";
std::cout << " Successes: " << compileSuccesses << " (" << compileSuccessRate << "%)\n";
std::cout << " Failures: " << compileFailures << "\n\n";
std::cout << "Reloads:\n";
std::cout << " Attempts: " << reloadAttempts << "\n";
std::cout << " Successes: " << reloadSuccesses << " (" << reloadSuccessRate << "%)\n";
std::cout << " Failures: " << reloadFailures << "\n";
std::cout << " Corrupted: " << corruptedLoads << "\n\n";
std::cout << "Stability:\n";
std::cout << " Crashes: " << crashes << "\n";
std::cout << " Reload avg: " << avgReloadTime << "ms\n";
std::cout << "================================================================================\n\n";
reporter.printFinalReport();
return reporter.getExitCode();
}
📊 Métriques Collectées
| Métrique | Description | Seuil |
|---|---|---|
| compile_success_rate_percent | % de compilations réussies | > 95% |
| reload_success_rate_percent | % de reloads réussis | > 99% |
| corrupted_loads | Nombre de .so corrompus chargés | 0 |
| crashes | Nombre de crashes engine | 0 |
| reload_time_avg_ms | Temps moyen de reload | > 100ms (prouve que file stability fonctionne) |
| reload_attempts | Nombre de tentatives de reload | N/A (info) |
✅ Critères de Succès
MUST PASS
- ✅ Compile success rate > 95%
- ✅ Corrupted loads = 0 (file stability check marche)
- ✅ Crashes = 0
- ✅ Reload success rate > 99%
- ✅ Reload time avg > 100ms (prouve attente file stability)
NICE TO HAVE
- ✅ Compile success rate = 100%
- ✅ Reload success rate = 100%
- ✅ Reload time avg < 600ms (efficace malgré stability check)
🔧 Détection de Corruptions
Dans ModuleLoader::loadModule()
// DÉJÀ IMPLÉMENTÉ - Vérification
auto origSize = std::filesystem::file_size(path);
auto copiedSize = std::filesystem::file_size(tempPath);
if (copiedSize != origSize) {
logger->error("❌ Incomplete copy: orig={} bytes, copied={} bytes", origSize, copiedSize);
throw std::runtime_error("Incomplete file copy detected - CORRUPTED");
}
// Tentative dlopen
void* handle = dlopen(tempPath.c_str(), RTLD_NOW | RTLD_LOCAL);
if (!handle) {
logger->error("❌ dlopen failed: {}", dlerror());
throw std::runtime_error(std::string("Failed to load module: ") + dlerror());
}
🐛 Cas d'Erreur Attendus
| Erreur | Cause | Comportement attendu |
|---|---|---|
| Corrupted .so loaded | File stability check raté | FAIL - augmenter stableRequired |
| Reload failure | dlopen pendant write | RETRY - file stability devrait éviter |
| Engine crash | Race dans dlopen/dlclose | FAIL - ajouter mutex |
| High reload time variance | Compilation variable | OK - tant que P99 < seuil |
📝 Output Attendu
================================================================================
RACE CONDITION HUNTER: 1000 compilations
================================================================================
Starting auto-compiler (300ms interval)...
Starting FileWatcher...
Starting engine loop (60 FPS)...
Running test...
[Compilation #3] File changed, triggering reload...
→ Reload OK (487ms), version: v3
[Compilation #7] File changed, triggering reload...
→ Reload OK (523ms), version: v7
Progress: 10% (100/1000 compilations)
Reloads: 98 OK, 0 FAIL
Corrupted loads: 0
Crashes: 0
Progress: 20% (200/1000 compilations)
Reloads: 195 OK, 2 FAIL
Corrupted loads: 0
Crashes: 0
...
Progress: 100% (1000/1000 compilations)
Reloads: 987 OK, 5 FAIL
Corrupted loads: 0
Crashes: 0
All threads stopped.
================================================================================
RACE CONDITION HUNTER SUMMARY
================================================================================
Compilations:
Total: 1000
Successes: 998 (99.8%)
Failures: 2
Reloads:
Attempts: 992
Successes: 987 (99.5%)
Failures: 5
Corrupted: 0
Stability:
Crashes: 0
Reload avg: 505ms
================================================================================
METRICS
================================================================================
Compile success: 99.8% (threshold: > 95%) ✓
Reload success: 99.5% (threshold: > 99%) ✓
Corrupted loads: 0 (threshold: 0) ✓
Crashes: 0 (threshold: 0) ✓
Reload time avg: 505ms (threshold: > 100ms) ✓
Result: ✅ PASSED
================================================================================
📅 Planning
Jour 1 (4h):
- Implémenter AutoCompiler helper
- Source modification automatique (version bump)
Jour 2 (4h):
- Implémenter test_04_race_condition.cpp
- Threading (compiler, watcher, engine)
- Synchronisation + safety
- Debug + validation
Prochaine étape: scenario_05_multimodule.md