StillHammer d8c5f93429 feat: Add comprehensive hot-reload test suite with 3 integration scenarios

This commit implements a complete test infrastructure for validating
hot-reload stability and robustness across multiple scenarios.

## New Test Infrastructure

### Test Helpers (tests/helpers/)
- TestMetrics: FPS, memory, reload time tracking with statistics
- TestReporter: Assertion tracking and formatted test reports
- SystemUtils: Memory usage monitoring via /proc/self/status
- TestAssertions: Macro-based assertion framework

### Test Modules
- TankModule: Realistic module with 50 tanks for production testing
- ChaosModule: Crash-injection module for robustness validation
- StressModule: Lightweight module for long-duration stability tests

## Integration Test Scenarios

### Scenario 1: Production Hot-Reload (test_01_production_hotreload.cpp)
✅ PASSED - End-to-end hot-reload validation
- 30 seconds simulation (1800 frames @ 60 FPS)
- TankModule with 50 tanks, realistic state
- Source modification (v1.0 → v2.0), recompilation, reload
- State preservation: positions, velocities, frameCount
- Metrics: ~163ms reload time, 0.88MB memory growth

### Scenario 2: Chaos Monkey (test_02_chaos_monkey.cpp)
✅ PASSED - Extreme robustness testing
- 150+ random crashes per run (5% crash probability per frame)
- 5 crash types: runtime_error, logic_error, out_of_range, domain_error, state corruption
- 100% recovery rate via automatic hot-reload
- Corrupted state detection and rejection
- Random seed for unpredictable crash patterns
- Proof of real reload: temporary files in /tmp/grove_module_*.so

### Scenario 3: Stress Test (test_03_stress_test.cpp)
✅ PASSED - Long-duration stability validation
- 10 minutes simulation (36000 frames @ 60 FPS)
- 120 hot-reloads (every 5 seconds)
- 100% reload success rate (120/120)
- Memory growth: 2 MB (threshold: 50 MB)
- Avg reload time: 160ms (threshold: 500ms)
- No memory leaks, no file descriptor leaks

## Core Engine Enhancements

### ModuleLoader (src/ModuleLoader.cpp)
- Temporary file copy to /tmp/ for Linux dlopen cache bypass
- Robust reload() method: getState() → unload() → load() → setState()
- Automatic cleanup of temporary files
- Comprehensive error handling and logging

### DebugEngine (src/DebugEngine.cpp)
- Automatic recovery in processModuleSystems()
- Exception catching → logging → module reload → continue
- Module state dump utilities for debugging

### SequentialModuleSystem (src/SequentialModuleSystem.cpp)
- extractModule() for safe module extraction
- registerModule() for module re-registration
- Enhanced processModules() with error handling

## Build System
- CMake configuration for test infrastructure
- Shared library compilation for test modules (.so)
- CTest integration for all scenarios
- PIC flag management for spdlog compatibility

## Documentation (planTI/)
- Complete test architecture documentation
- Detailed scenario specifications with success criteria
- Global test plan and validation thresholds

## Validation Results
All 3 integration scenarios pass successfully:
- Production hot-reload: State preservation validated
- Chaos Monkey: 100% recovery from 150+ crashes
- Stress Test: Stable over 120 reloads, minimal memory growth

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-13 22:13:07 +08:00

8.9 KiB

Raw Blame History

Seuils de Succès - Critères Pass/Fail

Ce document centralise tous les seuils de succès pour chaque scénario de test.

🎯 Philosophie des Seuils

Niveaux de Criticité

MUST PASS ✅: Critères obligatoires. Si un seul échoue → test FAIL
SHOULD PASS ⚠️: Critères recommandés. Si échec → WARNING dans logs
NICE TO HAVE 💡: Critères optimaux. Si échec → INFO dans logs

Rationale

Les seuils sont définis en fonction de:

Production readiness: Capacité à tourner en prod 24/7
User experience: Impact sur la fluidité (60 FPS = 16.67ms/frame)
Resource constraints: Memory, CPU, file descriptors
Industry standards: Temps de reload acceptable, uptime

📊 Scénario 1: Production Hot-Reload

MUST PASS ✅

Métrique	Seuil	Justification
`reload_time_ms`	< 1000ms	Reload > 1s = freeze visible pour l'utilisateur
`memory_growth_mb`	< 5MB	Croissance mémoire significative = leak probable
`fps_min`	> 30	< 30 FPS = jeu injouable
`tank_count_preserved`	50/50 (100%)	Perte d'entités = bug critique
`positions_preserved`	100%	Positions incorrectes = désync gameplay
`no_crashes`	true	Crash = inacceptable

SHOULD PASS ⚠️

Métrique	Seuil	Justification
`reload_time_ms`	< 500ms	Reload plus rapide = meilleure UX
`fps_min`	> 50	> 50 FPS = expérience très fluide

NICE TO HAVE 💡

Métrique	Seuil	Justification
`memory_growth_mb`	< 1MB	Memory growth minimal = système quasi-parfait
`reload_time_ms`	< 300ms	Reload imperceptible

📊 Scénario 2: Chaos Monkey

MUST PASS ✅

Métrique	Seuil	Justification
`engine_alive`	true	Engine mort = test fail total
`no_deadlocks`	true	Deadlock = système bloqué
`recovery_rate_percent`	> 95%	Recovery < 95% = système fragile
`memory_growth_mb`	< 10MB	5 min * 2MB/min = acceptable

SHOULD PASS ⚠️

Métrique	Seuil	Justification
`recovery_rate_percent`	= 100%	Recovery parfaite = robustesse optimale
`memory_growth_mb`	< 5MB	Quasi stable même avec chaos

NICE TO HAVE 💡

Métrique	Seuil	Justification
`reload_time_avg_ms`	< 500ms	Reload rapide même pendant chaos

📊 Scénario 3: Stress Test (10 minutes)

MUST PASS ✅

Métrique	Seuil	Justification
`memory_growth_mb`	< 20MB	10 min → 2MB/min = acceptable
`fd_leak`	= 0	Leak FD = crash système après N heures
`fps_min`	> 30	Minimum acceptable pour gameplay
`reload_time_p99_ms`	< 1000ms	P99 > 1s = dégradation visible
`cpu_stddev_percent`	< 10%	Stabilité CPU = pas de busy loop
`no_crashes`	true	Crash = fail

SHOULD PASS ⚠️

Métrique	Seuil	Justification
`memory_growth_mb`	< 10MB	Très stable
`reload_time_p99_ms`	< 750ms	Excellent

NICE TO HAVE 💡

Métrique	Seuil	Justification
`memory_growth_mb`	< 5MB	Quasi-parfait
`fps_min`	> 50	Très fluide

📊 Scénario 3: Stress Test (1 heure - Nightly)

MUST PASS ✅

Métrique	Seuil	Justification
`memory_growth_mb`	< 100MB	1h → ~1.5MB/min = acceptable
`fd_leak`	= 0	Critique
`fps_min`	> 30	Minimum acceptable
`reload_time_p99_ms`	< 1000ms	Pas de dégradation sur durée
`cpu_stddev_percent`	< 10%	Stabilité
`no_crashes`	true	Critique

SHOULD PASS ⚠️

Métrique	Seuil	Justification
`memory_growth_mb`	< 50MB	Très bon
`reload_time_p99_ms`	< 750ms	Excellent

📊 Scénario 4: Race Condition Hunter

MUST PASS ✅

Métrique	Seuil	Justification
`compile_success_rate_percent`	> 95%	Quelques échecs compilation OK (disk IO, etc.)
`reload_success_rate_percent`	> 99%	Presque tous les reloads doivent marcher
`corrupted_loads`	= 0	.so corrompu = file stability check raté
`crashes`	= 0	Race condition non gérée = crash
`reload_time_avg_ms`	> 100ms	Prouve que file stability check fonctionne (attend ~500ms)

SHOULD PASS ⚠️

Métrique	Seuil	Justification
`compile_success_rate_percent`	= 100%	Compilations toujours OK = environnement stable
`reload_success_rate_percent`	= 100%	Parfait

NICE TO HAVE 💡

Métrique	Seuil	Justification
`reload_time_avg_ms`	< 600ms	Rapide malgré file stability

📊 Scénario 5: Multi-Module Orchestration

MUST PASS ✅

Métrique	Seuil	Justification
`map_unaffected`	true	Isolation critique
`tank_unaffected`	true	Isolation critique
`production_reloaded`	true	Reload doit marcher
`reload_time_ms`	< 1000ms	Standard
`no_crashes`	true	Critique
`execution_order_preserved`	true	Ordre critique pour dépendances

SHOULD PASS ⚠️

Métrique	Seuil	Justification
`reload_time_ms`	< 500ms	Bon

NICE TO HAVE 💡

Métrique	Seuil	Justification
`zero_fps_impact`	true	FPS identique avant/pendant/après reload

🔧 Implémentation dans les Tests

Pattern de Vérification

// Dans chaque test
TestReporter reporter("Scenario Name");

// MUST PASS assertions
ASSERT_LT(reloadTime, 1000.0f, "Reload time MUST be < 1000ms");
reporter.addAssertion("reload_time_ok", reloadTime < 1000.0f);

// SHOULD PASS (warning only)
if (reloadTime >= 500.0f) {
    std::cout << "⚠️  WARNING: Reload time should be < 500ms (got " << reloadTime << "ms)\n";
}

// NICE TO HAVE (info only)
if (reloadTime < 300.0f) {
    std::cout << "💡 EXCELLENT: Reload time < 300ms\n";
}

// Exit code basé sur MUST PASS uniquement
return reporter.getExitCode(); // 0 si tous MUST PASS OK, 1 sinon

TestReporter Extension

// Ajouter dans TestReporter
enum class AssertionLevel {
    MUST_PASS,
    SHOULD_PASS,
    NICE_TO_HAVE
};

void addAssertion(const std::string& name, bool passed, AssertionLevel level);

int getExitCode() const {
    // Fail si au moins un MUST_PASS échoue
    for (const auto& [name, passed, level] : assertions) {
        if (level == AssertionLevel::MUST_PASS && !passed) {
            return 1;
        }
    }
    return 0;
}

📋 Tableau Récapitulatif - MUST PASS

Scénario	Métriques Critiques	Valeurs
Production Hot-Reload	reload_time, memory_growth, fps_min, state_preservation	< 1s, < 5MB, > 30, 100%
Chaos Monkey	engine_alive, recovery_rate, memory_growth	true, > 95%, < 10MB
Stress Test (10min)	memory_growth, fd_leak, fps_min, reload_p99	< 20MB, 0, > 30, < 1s
Stress Test (1h)	memory_growth, fd_leak, fps_min, reload_p99	< 100MB, 0, > 30, < 1s
Race Condition	corrupted_loads, crashes, reload_success	0, 0, > 99%
Multi-Module	isolation, reload_ok, execution_order	100%, true, preserved

🎯 Validation Globale

Pour que la suite de tests PASSE:

✅ TOUS les scénarios Phase 1 (1-2-3) doivent PASSER leurs MUST PASS ✅ Au moins 80% des scénarios Phase 2 (4-5) doivent PASSER leurs MUST PASS

Pour déclarer le système "Production Ready":

✅ Tous les scénarios MUST PASS ✅ Au moins 70% des SHOULD PASS ✅ Aucun crash dans aucun scénario ✅ Stress test 1h (nightly) PASSE

📝 Révision des Seuils

Les seuils peuvent être ajustés après analyse des résultats initiaux si:

Hardware différent (plus lent) justifie seuils plus permissifs
Optimisations permettent seuils plus stricts
Nouvelles fonctionnalités changent les contraintes

Process de révision:

Documenter la justification dans ce fichier
Mettre à jour les scénarios correspondants
Re-run tous les tests avec nouveaux seuils
Commit changes avec message clair

Dernière mise à jour: 2025-11-13 Version des seuils: 1.0

8.9 KiB Raw Blame History

Seuils de Succès - Critères Pass/Fail

🎯 Philosophie des Seuils

Niveaux de Criticité

Rationale

📊 Scénario 1: Production Hot-Reload

MUST PASS ✅

SHOULD PASS ⚠️

NICE TO HAVE 💡

📊 Scénario 2: Chaos Monkey

MUST PASS ✅

SHOULD PASS ⚠️

NICE TO HAVE 💡

📊 Scénario 3: Stress Test (10 minutes)

MUST PASS ✅

SHOULD PASS ⚠️

NICE TO HAVE 💡

📊 Scénario 3: Stress Test (1 heure - Nightly)

MUST PASS ✅

SHOULD PASS ⚠️

📊 Scénario 4: Race Condition Hunter

MUST PASS ✅

SHOULD PASS ⚠️

NICE TO HAVE 💡

📊 Scénario 5: Multi-Module Orchestration

MUST PASS ✅

SHOULD PASS ⚠️

NICE TO HAVE 💡

🔧 Implémentation dans les Tests

Pattern de Vérification

TestReporter Extension

📋 Tableau Récapitulatif - MUST PASS

🎯 Validation Globale

Pour que la suite de tests PASSE:

Pour déclarer le système "Production Ready":

📝 Révision des Seuils

8.9 KiB

Raw Blame History