GroveEngine/planTI/scenario_04_race_condition.md
StillHammer d8c5f93429 feat: Add comprehensive hot-reload test suite with 3 integration scenarios
This commit implements a complete test infrastructure for validating
hot-reload stability and robustness across multiple scenarios.

## New Test Infrastructure

### Test Helpers (tests/helpers/)
- TestMetrics: FPS, memory, reload time tracking with statistics
- TestReporter: Assertion tracking and formatted test reports
- SystemUtils: Memory usage monitoring via /proc/self/status
- TestAssertions: Macro-based assertion framework

### Test Modules
- TankModule: Realistic module with 50 tanks for production testing
- ChaosModule: Crash-injection module for robustness validation
- StressModule: Lightweight module for long-duration stability tests

## Integration Test Scenarios

### Scenario 1: Production Hot-Reload (test_01_production_hotreload.cpp)
 PASSED - End-to-end hot-reload validation
- 30 seconds simulation (1800 frames @ 60 FPS)
- TankModule with 50 tanks, realistic state
- Source modification (v1.0 → v2.0), recompilation, reload
- State preservation: positions, velocities, frameCount
- Metrics: ~163ms reload time, 0.88MB memory growth

### Scenario 2: Chaos Monkey (test_02_chaos_monkey.cpp)
 PASSED - Extreme robustness testing
- 150+ random crashes per run (5% crash probability per frame)
- 5 crash types: runtime_error, logic_error, out_of_range, domain_error, state corruption
- 100% recovery rate via automatic hot-reload
- Corrupted state detection and rejection
- Random seed for unpredictable crash patterns
- Proof of real reload: temporary files in /tmp/grove_module_*.so

### Scenario 3: Stress Test (test_03_stress_test.cpp)
 PASSED - Long-duration stability validation
- 10 minutes simulation (36000 frames @ 60 FPS)
- 120 hot-reloads (every 5 seconds)
- 100% reload success rate (120/120)
- Memory growth: 2 MB (threshold: 50 MB)
- Avg reload time: 160ms (threshold: 500ms)
- No memory leaks, no file descriptor leaks

## Core Engine Enhancements

### ModuleLoader (src/ModuleLoader.cpp)
- Temporary file copy to /tmp/ for Linux dlopen cache bypass
- Robust reload() method: getState() → unload() → load() → setState()
- Automatic cleanup of temporary files
- Comprehensive error handling and logging

### DebugEngine (src/DebugEngine.cpp)
- Automatic recovery in processModuleSystems()
- Exception catching → logging → module reload → continue
- Module state dump utilities for debugging

### SequentialModuleSystem (src/SequentialModuleSystem.cpp)
- extractModule() for safe module extraction
- registerModule() for module re-registration
- Enhanced processModules() with error handling

## Build System
- CMake configuration for test infrastructure
- Shared library compilation for test modules (.so)
- CTest integration for all scenarios
- PIC flag management for spdlog compatibility

## Documentation (planTI/)
- Complete test architecture documentation
- Detailed scenario specifications with success criteria
- Global test plan and validation thresholds

## Validation Results
All 3 integration scenarios pass successfully:
- Production hot-reload: State preservation validated
- Chaos Monkey: 100% recovery from 150+ crashes
- Stress Test: Stable over 120 reloads, minimal memory growth

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 22:13:07 +08:00

495 lines
17 KiB
Markdown

# Scénario 4: Race Condition Hunter
**Priorité**: ⭐⭐ IMPORTANT
**Phase**: 2 (SHOULD HAVE)
**Durée estimée**: ~10 minutes (1000 compilations)
**Effort implémentation**: ~6-8 heures
---
## 🎯 Objectif
Détecter et valider la robustesse face aux race conditions lors de la compilation concurrente:
- FileWatcher détecte changements pendant compilation
- File stability check fonctionne
- Aucun .so corrompu chargé
- Aucun deadlock entre threads
- 100% success rate des reloads
**C'est le test qui a motivé le fix de la race condition initiale !**
---
## 📋 Description
### Setup
1. Thread 1 (Compiler): Recompile `TestModule.so` toutes les 300ms
2. Thread 2 (FileWatcher): Détecte changements et déclenche reload
3. Thread 3 (Engine): Exécute `process()` en tight loop à 60 FPS
4. Durée: 1000 cycles de compilation (~5 minutes)
### Comportements à Tester
- **File stability check**: Attend que le fichier soit stable avant reload
- **Size verification**: Vérifie que le .so copié est complet
- **Concurrent access**: Pas de corruption pendant dlopen/dlclose
- **Error handling**: Détecte et récupère des .so incomplets
---
## 🏗️ Implémentation
### AutoCompiler Helper
```cpp
// helpers/AutoCompiler.h
class AutoCompiler {
public:
AutoCompiler(const std::string& moduleName, const std::string& buildDir)
: moduleName(moduleName), buildDir(buildDir), isRunning(false) {}
void start(int iterations, int intervalMs) {
isRunning = true;
compilerThread = std::thread([this, iterations, intervalMs]() {
for (int i = 0; i < iterations && isRunning; i++) {
compile(i);
std::this_thread::sleep_for(std::chrono::milliseconds(intervalMs));
}
});
}
void stop() {
isRunning = false;
if (compilerThread.joinable()) {
compilerThread.join();
}
}
int getSuccessCount() const { return successCount; }
int getFailureCount() const { return failureCount; }
int getCurrentIteration() const { return currentIteration; }
private:
std::string moduleName;
std::string buildDir;
std::atomic<bool> isRunning;
std::atomic<int> successCount{0};
std::atomic<int> failureCount{0};
std::atomic<int> currentIteration{0};
std::thread compilerThread;
void compile(int iteration) {
currentIteration = iteration;
// Modifier source pour forcer recompilation
modifySourceVersion(iteration);
// Compiler
std::string cmd = "cmake --build " + buildDir + " --target " + moduleName + " 2>&1";
int result = system(cmd.c_str());
if (result == 0) {
successCount++;
} else {
failureCount++;
std::cerr << "Compilation failed at iteration " << iteration << "\n";
}
}
void modifySourceVersion(int iteration) {
// Modifier TestModule.cpp pour changer version
std::string sourcePath = buildDir + "/../tests/modules/TestModule.cpp";
std::ifstream input(sourcePath);
std::string content((std::istreambuf_iterator<char>(input)), std::istreambuf_iterator<char>());
input.close();
// Remplacer version
std::regex versionRegex(R"(moduleVersion = "v[0-9]+")");
std::string newVersion = "moduleVersion = \"v" + std::to_string(iteration) + "\"";
content = std::regex_replace(content, versionRegex, newVersion);
std::ofstream output(sourcePath);
output << content;
}
};
```
### Test Principal
```cpp
// test_04_race_condition.cpp
#include "helpers/AutoCompiler.h"
#include "helpers/TestMetrics.h"
#include "helpers/TestReporter.h"
#include <atomic>
#include <thread>
int main() {
TestReporter reporter("Race Condition Hunter");
TestMetrics metrics;
const int TOTAL_COMPILATIONS = 1000;
const int COMPILE_INTERVAL_MS = 300;
std::cout << "================================================================================\n";
std::cout << "RACE CONDITION HUNTER: " << TOTAL_COMPILATIONS << " compilations\n";
std::cout << "================================================================================\n\n";
// === SETUP ===
DebugEngine engine;
engine.loadModule("TestModule", "build/modules/libTestModule.so");
auto config = createJsonConfig({{"version", "v0"}});
engine.initializeModule("TestModule", config);
// === STATISTIQUES ===
std::atomic<int> reloadAttempts{0};
std::atomic<int> reloadSuccesses{0};
std::atomic<int> reloadFailures{0};
std::atomic<int> corruptedLoads{0};
std::atomic<int> crashes{0};
std::atomic<bool> engineRunning{true};
// === THREAD 1: Auto-Compiler ===
std::cout << "Starting auto-compiler (300ms interval)...\n";
AutoCompiler compiler("TestModule", "build");
compiler.start(TOTAL_COMPILATIONS, COMPILE_INTERVAL_MS);
// === THREAD 2: FileWatcher + Reload ===
std::cout << "Starting FileWatcher...\n";
std::thread watcherThread([&]() {
std::string soPath = "build/modules/libTestModule.so";
std::filesystem::file_time_type lastWriteTime;
try {
lastWriteTime = std::filesystem::last_write_time(soPath);
} catch (...) {
std::cerr << "Failed to get initial file time\n";
return;
}
while (engineRunning && compiler.getCurrentIteration() < TOTAL_COMPILATIONS) {
try {
auto currentWriteTime = std::filesystem::last_write_time(soPath);
if (currentWriteTime != lastWriteTime) {
// FICHIER MODIFIÉ - RELOAD
reloadAttempts++;
std::cout << "[Compilation #" << compiler.getCurrentIteration()
<< "] File changed, triggering reload...\n";
auto reloadStart = std::chrono::high_resolution_clock::now();
try {
// Le ModuleLoader va attendre file stability
engine.reloadModule("TestModule");
auto reloadEnd = std::chrono::high_resolution_clock::now();
float reloadTime = std::chrono::duration<float, std::milli>(reloadEnd - reloadStart).count();
metrics.recordReloadTime(reloadTime);
reloadSuccesses++;
// Vérifier que le module est valide
auto state = engine.getModuleState("TestModule");
auto* jsonNode = dynamic_cast<JsonDataNode*>(state.get());
const auto& stateJson = jsonNode->getJsonData();
std::string version = stateJson["version"];
std::cout << " → Reload OK (" << reloadTime << "ms), version: " << version << "\n";
} catch (const std::exception& e) {
reloadFailures++;
std::cerr << " → Reload FAILED: " << e.what() << "\n";
// Vérifier si c'est un .so corrompu
if (std::string(e.what()).find("Incomplete") != std::string::npos ||
std::string(e.what()).find("dlopen") != std::string::npos) {
corruptedLoads++;
}
}
lastWriteTime = currentWriteTime;
}
} catch (const std::filesystem::filesystem_error& e) {
// Fichier en cours d'écriture, ignore
}
std::this_thread::sleep_for(std::chrono::milliseconds(50));
}
});
// === THREAD 3: Engine Loop ===
std::cout << "Starting engine loop (60 FPS)...\n";
std::thread engineThread([&]() {
int frame = 0;
while (engineRunning && compiler.getCurrentIteration() < TOTAL_COMPILATIONS) {
auto frameStart = std::chrono::high_resolution_clock::now();
try {
engine.update(1.0f / 60.0f);
// Métriques
auto frameEnd = std::chrono::high_resolution_clock::now();
float frameTime = std::chrono::duration<float, std::milli>(frameEnd - frameStart).count();
metrics.recordFPS(1000.0f / frameTime);
if (frame % 60 == 0) {
metrics.recordMemoryUsage(getCurrentMemoryUsage());
}
} catch (const std::exception& e) {
crashes++;
std::cerr << "[Frame " << frame << "] ENGINE CRASH: " << e.what() << "\n";
// Continue malgré le crash (test robustesse)
}
frame++;
// Sleep pour maintenir 60 FPS
auto frameEnd = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration<float, std::milli>(frameEnd - frameStart).count();
int sleepMs = std::max(0, static_cast<int>(16.67f - elapsed));
std::this_thread::sleep_for(std::chrono::milliseconds(sleepMs));
}
});
// === ATTENDRE FIN ===
std::cout << "\nRunning test...\n";
// Progress monitoring
while (compiler.getCurrentIteration() < TOTAL_COMPILATIONS) {
std::this_thread::sleep_for(std::chrono::seconds(10));
int progress = (compiler.getCurrentIteration() * 100) / TOTAL_COMPILATIONS;
std::cout << "Progress: " << progress << "% ("
<< compiler.getCurrentIteration() << "/" << TOTAL_COMPILATIONS << " compilations)\n";
std::cout << " Reloads: " << reloadSuccesses << " OK, " << reloadFailures << " FAIL\n";
std::cout << " Corrupted loads: " << corruptedLoads << "\n";
std::cout << " Crashes: " << crashes << "\n\n";
}
// Stop tous les threads
engineRunning = false;
compiler.stop();
watcherThread.join();
engineThread.join();
std::cout << "\nAll threads stopped.\n\n";
// === VÉRIFICATIONS FINALES ===
int compileSuccesses = compiler.getSuccessCount();
int compileFailures = compiler.getFailureCount();
float compileSuccessRate = (compileSuccesses * 100.0f) / TOTAL_COMPILATIONS;
float reloadSuccessRate = (reloadAttempts > 0) ? (reloadSuccesses * 100.0f / reloadAttempts) : 100.0f;
// Assertions
ASSERT_GT(compileSuccessRate, 95.0f, "Compile success rate should be > 95%");
reporter.addMetric("compile_success_rate_percent", compileSuccessRate);
ASSERT_EQ(corruptedLoads, 0, "Should have 0 corrupted loads (file stability check should prevent this)");
reporter.addMetric("corrupted_loads", corruptedLoads);
ASSERT_EQ(crashes, 0, "Should have 0 crashes");
reporter.addMetric("crashes", crashes);
// Si on a des reloads, vérifier le success rate
if (reloadAttempts > 0) {
ASSERT_GT(reloadSuccessRate, 99.0f, "Reload success rate should be > 99%");
}
reporter.addMetric("reload_success_rate_percent", reloadSuccessRate);
// Vérifier que file stability check a fonctionné (temps moyen > 0)
float avgReloadTime = metrics.getReloadTimeAvg();
ASSERT_GT(avgReloadTime, 100.0f, "Avg reload time should be > 100ms (file stability wait)");
reporter.addMetric("reload_time_avg_ms", avgReloadTime);
reporter.addMetric("total_compilations", TOTAL_COMPILATIONS);
reporter.addMetric("compile_successes", compileSuccesses);
reporter.addMetric("compile_failures", compileFailures);
reporter.addMetric("reload_attempts", static_cast<int>(reloadAttempts));
reporter.addMetric("reload_successes", static_cast<int>(reloadSuccesses));
reporter.addMetric("reload_failures", static_cast<int>(reloadFailures));
// === RAPPORT FINAL ===
std::cout << "================================================================================\n";
std::cout << "RACE CONDITION HUNTER SUMMARY\n";
std::cout << "================================================================================\n";
std::cout << "Compilations:\n";
std::cout << " Total: " << TOTAL_COMPILATIONS << "\n";
std::cout << " Successes: " << compileSuccesses << " (" << compileSuccessRate << "%)\n";
std::cout << " Failures: " << compileFailures << "\n\n";
std::cout << "Reloads:\n";
std::cout << " Attempts: " << reloadAttempts << "\n";
std::cout << " Successes: " << reloadSuccesses << " (" << reloadSuccessRate << "%)\n";
std::cout << " Failures: " << reloadFailures << "\n";
std::cout << " Corrupted: " << corruptedLoads << "\n\n";
std::cout << "Stability:\n";
std::cout << " Crashes: " << crashes << "\n";
std::cout << " Reload avg: " << avgReloadTime << "ms\n";
std::cout << "================================================================================\n\n";
reporter.printFinalReport();
return reporter.getExitCode();
}
```
---
## 📊 Métriques Collectées
| Métrique | Description | Seuil |
|----------|-------------|-------|
| **compile_success_rate_percent** | % de compilations réussies | > 95% |
| **reload_success_rate_percent** | % de reloads réussis | > 99% |
| **corrupted_loads** | Nombre de .so corrompus chargés | 0 |
| **crashes** | Nombre de crashes engine | 0 |
| **reload_time_avg_ms** | Temps moyen de reload | > 100ms (prouve que file stability fonctionne) |
| **reload_attempts** | Nombre de tentatives de reload | N/A (info) |
---
## ✅ Critères de Succès
### MUST PASS
1. ✅ Compile success rate > 95%
2. ✅ Corrupted loads = 0 (file stability check marche)
3. ✅ Crashes = 0
4. ✅ Reload success rate > 99%
5. ✅ Reload time avg > 100ms (prouve attente file stability)
### NICE TO HAVE
1. ✅ Compile success rate = 100%
2. ✅ Reload success rate = 100%
3. ✅ Reload time avg < 600ms (efficace malgré stability check)
---
## 🔧 Détection de Corruptions
### Dans ModuleLoader::loadModule()
```cpp
// DÉJÀ IMPLÉMENTÉ - Vérification
auto origSize = std::filesystem::file_size(path);
auto copiedSize = std::filesystem::file_size(tempPath);
if (copiedSize != origSize) {
logger->error("❌ Incomplete copy: orig={} bytes, copied={} bytes", origSize, copiedSize);
throw std::runtime_error("Incomplete file copy detected - CORRUPTED");
}
// Tentative dlopen
void* handle = dlopen(tempPath.c_str(), RTLD_NOW | RTLD_LOCAL);
if (!handle) {
logger->error("❌ dlopen failed: {}", dlerror());
throw std::runtime_error(std::string("Failed to load module: ") + dlerror());
}
```
---
## 🐛 Cas d'Erreur Attendus
| Erreur | Cause | Comportement attendu |
|--------|-------|---------------------|
| Corrupted .so loaded | File stability check raté | FAIL - augmenter stableRequired |
| Reload failure | dlopen pendant write | RETRY - file stability devrait éviter |
| Engine crash | Race dans dlopen/dlclose | FAIL - ajouter mutex |
| High reload time variance | Compilation variable | OK - tant que P99 < seuil |
---
## 📝 Output Attendu
```
================================================================================
RACE CONDITION HUNTER: 1000 compilations
================================================================================
Starting auto-compiler (300ms interval)...
Starting FileWatcher...
Starting engine loop (60 FPS)...
Running test...
[Compilation #3] File changed, triggering reload...
→ Reload OK (487ms), version: v3
[Compilation #7] File changed, triggering reload...
→ Reload OK (523ms), version: v7
Progress: 10% (100/1000 compilations)
Reloads: 98 OK, 0 FAIL
Corrupted loads: 0
Crashes: 0
Progress: 20% (200/1000 compilations)
Reloads: 195 OK, 2 FAIL
Corrupted loads: 0
Crashes: 0
...
Progress: 100% (1000/1000 compilations)
Reloads: 987 OK, 5 FAIL
Corrupted loads: 0
Crashes: 0
All threads stopped.
================================================================================
RACE CONDITION HUNTER SUMMARY
================================================================================
Compilations:
Total: 1000
Successes: 998 (99.8%)
Failures: 2
Reloads:
Attempts: 992
Successes: 987 (99.5%)
Failures: 5
Corrupted: 0
Stability:
Crashes: 0
Reload avg: 505ms
================================================================================
METRICS
================================================================================
Compile success: 99.8% (threshold: > 95%) ✓
Reload success: 99.5% (threshold: > 99%) ✓
Corrupted loads: 0 (threshold: 0) ✓
Crashes: 0 (threshold: 0) ✓
Reload time avg: 505ms (threshold: > 100ms) ✓
Result: ✅ PASSED
================================================================================
```
---
## 📅 Planning
**Jour 1 (4h):**
- Implémenter AutoCompiler helper
- Source modification automatique (version bump)
**Jour 2 (4h):**
- Implémenter test_04_race_condition.cpp
- Threading (compiler, watcher, engine)
- Synchronisation + safety
- Debug + validation
---
**Prochaine étape**: `scenario_05_multimodule.md`