Commit Graph

7 Commits

Author SHA1 Message Date
98acb32c4c fix: Resolve deadlock in IntraIOManager + cleanup SEGFAULTs
- Fix critical deadlock in IntraIOManager using std::scoped_lock for
  multi-mutex acquisition (CrossSystemIntegration: 1901s → 4s)
- Add std::shared_mutex for read-heavy operations (TopicTree, IntraIOManager)
- Fix SEGFAULT in SequentialModuleSystem destructor (logger guard)
- Fix SEGFAULT in ModuleLoader (don't auto-unload when modules still alive)
- Fix iterator invalidation in DependencyTestEngine destructor
- Add TSan/Helgrind integration for deadlock detection
- Add coding guidelines for synchronization patterns

All 23 tests now pass (100%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 11:36:33 +08:00
572e133f4e docs: Consolidate all plans into docs/plans/ directory
- Create new docs/plans/ directory with organized structure
- Add comprehensive PLAN_deadlock_detection_prevention.md (15h plan)
  - ThreadSanitizer integration (2h)
  - Helgrind validation (3h)
  - std::scoped_lock refactoring (4h)
  - std::shared_mutex optimization (6h)
- Migrate 16 plans from planTI/ to docs/plans/
  - Rename all files to PLAN_*.md convention
  - Update README.md with index and statuses
- Remove old planTI/ directory
- Add run_all_tests.sh script for test automation

Plans now include:
- 1 active development plan (deadlock prevention)
- 3 test architecture plans
- 13 integration test scenario plans

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 19:32:33 +08:00
ddbed30ed7 feat: Add Scenario 11 IO System test & fix IntraIO routing architecture
Implémentation complète du scénario 11 (IO System Stress Test) avec correction majeure de l'architecture de routing IntraIO.

## Nouveaux Modules de Test (Scenario 11)
- ProducerModule: Publie messages pour tests IO
- ConsumerModule: Consomme et valide messages reçus
- BroadcastModule: Test multi-subscriber broadcasting
- BatchModule: Test low-frequency batching
- IOStressModule: Tests de charge concurrents

## Test d'Intégration
- test_11_io_system.cpp: 6 tests validant:
  * Basic Publish-Subscribe
  * Pattern Matching avec wildcards
  * Multi-Module Routing (1-to-many)
  * Low-Frequency Subscriptions (batching)
  * Backpressure & Queue Overflow
  * Thread Safety (concurrent pub/pull)

## Fix Architecture Critique: IntraIO Routing
**Problème**: IntraIO::publish() et subscribe() n'utilisaient PAS IntraIOManager pour router entre modules.

**Solution**: Utilisation de JSON comme format de transport intermédiaire
- IntraIO::publish() → extrait JSON → IntraIOManager::routeMessage()
- IntraIO::subscribe() → enregistre au IntraIOManager::registerSubscription()
- IntraIOManager::routeMessage() → copie JSON pour chaque subscriber → deliverMessage()

**Bénéfices**:
-  Routing centralisé fonctionnel
-  Support 1-to-many (copie JSON au lieu de move unique_ptr)
-  Pas besoin d'implémenter IDataNode::clone()
-  Compatible futur NetworkIO (JSON sérialisable)

## Modules Scenario 13 (Cross-System)
- ConfigWatcherModule, PlayerModule, EconomyModule, MetricsModule
- test_13_cross_system.cpp (stub)

## Documentation
- CLAUDE_NEXT_SESSION.md: Instructions détaillées pour build/test

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 11:43:08 +08:00
aa322d5214 fix: Correct hot-reload version validation in race condition test
Fixed critical bug where moduleVersion was being overwritten during
setConfiguration(), preventing proper hot-reload validation.

## Problem
- TestModule::setConfiguration() called configNode.getString("version")
- This overwrote the compiled moduleVersion (v2, v3, etc.) back to "v1"
- All reloads appeared successful but versions never actually changed
- Test validated thread safety but NOT actual hot-reload functionality

## Solution
- Removed moduleVersion overwrite from setConfiguration()
- moduleVersion now preserved as global compiled into .so
- Added clear comments explaining this is a compile-time value
- Simplified test configuration (no longer passes version param)

## Test Results (After Fix)
 15/15 compilations (100%)
 29/29 reloads (100%)
 Versions actually change: v1 → v2 → v5 → v14 → v15
 0 corruptions
 0 crashes
 330ms avg reload time (file stability check working)
 Test now validates REAL hot-reload, not just thread safety

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 13:21:57 +08:00
484b9ab5d4 feat: Add Scenario 4 - Race Condition Hunter test suite
Add comprehensive concurrent compilation and hot-reload testing infrastructure
to validate thread safety and file stability during race conditions.

## New Components

### AutoCompiler Helper (tests/helpers/AutoCompiler.{h,cpp})
- Automatically modifies source files to bump version numbers
- Compiles modules repeatedly on separate thread (15 iterations @ 1s interval)
- Tracks compilation success/failure rates with atomic counters
- Thread-safe compilation statistics

### Race Condition Test (tests/integration/test_04_race_condition.cpp)
- **3 concurrent threads:**
  - Compiler: Recompiles TestModule.so every 1 second
  - FileWatcher: Detects .so changes and triggers hot-reload with mutex protection
  - Engine: Runs at 60 FPS with try_lock to skip frames during reload
- Validates module integrity (health status, version, configuration)
- Tracks metrics: compilation rate, reload success, corrupted loads, crashes
- 90-second timeout with progress monitoring

### TestModule Enhancements (tests/modules/TestModule.cpp)
- Added global moduleVersion variable for AutoCompiler modification
- Version bumping support for reload validation

## Test Results (Initial Implementation)

```
Duration: 88s
Compilations:  15/15 (100%) 
Reloads:       ~30 (100% success) 
Corrupted:     0 
Crashes:       0 
File Stability: 328ms avg (proves >100ms wait) 
```

## Known Issue (To Fix in Next Commit)
- Module versions not actually changing during reload
- setConfiguration() overwrites compiled version
- Reload mechanism validated but version bumping needs fix

## Files Modified
- tests/CMakeLists.txt: Add AutoCompiler to helpers, add test_04
- tests/modules/TestModule.cpp: Add version bumping support
- .gitignore: Add build/ and logs/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 10:55:44 +08:00
d8c5f93429 feat: Add comprehensive hot-reload test suite with 3 integration scenarios
This commit implements a complete test infrastructure for validating
hot-reload stability and robustness across multiple scenarios.

## New Test Infrastructure

### Test Helpers (tests/helpers/)
- TestMetrics: FPS, memory, reload time tracking with statistics
- TestReporter: Assertion tracking and formatted test reports
- SystemUtils: Memory usage monitoring via /proc/self/status
- TestAssertions: Macro-based assertion framework

### Test Modules
- TankModule: Realistic module with 50 tanks for production testing
- ChaosModule: Crash-injection module for robustness validation
- StressModule: Lightweight module for long-duration stability tests

## Integration Test Scenarios

### Scenario 1: Production Hot-Reload (test_01_production_hotreload.cpp)
 PASSED - End-to-end hot-reload validation
- 30 seconds simulation (1800 frames @ 60 FPS)
- TankModule with 50 tanks, realistic state
- Source modification (v1.0 → v2.0), recompilation, reload
- State preservation: positions, velocities, frameCount
- Metrics: ~163ms reload time, 0.88MB memory growth

### Scenario 2: Chaos Monkey (test_02_chaos_monkey.cpp)
 PASSED - Extreme robustness testing
- 150+ random crashes per run (5% crash probability per frame)
- 5 crash types: runtime_error, logic_error, out_of_range, domain_error, state corruption
- 100% recovery rate via automatic hot-reload
- Corrupted state detection and rejection
- Random seed for unpredictable crash patterns
- Proof of real reload: temporary files in /tmp/grove_module_*.so

### Scenario 3: Stress Test (test_03_stress_test.cpp)
 PASSED - Long-duration stability validation
- 10 minutes simulation (36000 frames @ 60 FPS)
- 120 hot-reloads (every 5 seconds)
- 100% reload success rate (120/120)
- Memory growth: 2 MB (threshold: 50 MB)
- Avg reload time: 160ms (threshold: 500ms)
- No memory leaks, no file descriptor leaks

## Core Engine Enhancements

### ModuleLoader (src/ModuleLoader.cpp)
- Temporary file copy to /tmp/ for Linux dlopen cache bypass
- Robust reload() method: getState() → unload() → load() → setState()
- Automatic cleanup of temporary files
- Comprehensive error handling and logging

### DebugEngine (src/DebugEngine.cpp)
- Automatic recovery in processModuleSystems()
- Exception catching → logging → module reload → continue
- Module state dump utilities for debugging

### SequentialModuleSystem (src/SequentialModuleSystem.cpp)
- extractModule() for safe module extraction
- registerModule() for module re-registration
- Enhanced processModules() with error handling

## Build System
- CMake configuration for test infrastructure
- Shared library compilation for test modules (.so)
- CTest integration for all scenarios
- PIC flag management for spdlog compatibility

## Documentation (planTI/)
- Complete test architecture documentation
- Detailed scenario specifications with success criteria
- Global test plan and validation thresholds

## Validation Results
All 3 integration scenarios pass successfully:
- Production hot-reload: State preservation validated
- Chaos Monkey: 100% recovery from 150+ crashes
- Stress Test: Stable over 120 reloads, minimal memory growth

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 22:13:07 +08:00
4659c17340 feat: Complete migration from json to IDataNode API
Migrated all implementations to use the new IDataNode abstraction layer:

Core Changes:
- Added spdlog dependency via FetchContent for comprehensive logging
- Enabled POSITION_INDEPENDENT_CODE for grove_impl (required for .so modules)
- Updated all factory createFromConfig() methods to accept IDataNode instead of json
- Replaced json parameters with std::unique_ptr<IDataNode> throughout

Migrated Files (8 core implementations):
- IntraIO: Complete rewrite with IDataNode API and move semantics
- IntraIOManager: Updated message routing with unique_ptr delivery
- SequentialModuleSystem: Migrated to IDataNode input/task handling
- IOFactory: Changed config parsing to use IDataNode getters
- ModuleFactory: Updated all config methods
- EngineFactory: Updated all config methods
- ModuleSystemFactory: Updated all config methods
- DebugEngine: Migrated debug output to IDataNode

Testing Infrastructure:
- Added hot-reload test (TestModule.so + test_hotreload executable)
- Validated 0.012ms hot-reload performance
- State preservation across module reloads working correctly

Technical Details:
- Used JsonDataNode/JsonDataTree as IDataNode backend (nlohmann::json)
- Changed all json::operator[] to getString()/getInt()/getBool()
- Implemented move semantics for unique_ptr<IDataNode> message passing
- Note: IDataNode::clone() not implemented yet (IntraIOManager delivers to first match only)

All files now compile successfully with 100% IDataNode API compliance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-30 07:17:06 +08:00