master
2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| c727873046 |
perf(ThreadedModuleSystem): Atomic barrier + fair benchmark - 1.7x to 6.8x speedup
Critical performance fixes for ThreadedModuleSystem achieving 69-88% parallel efficiency. ## Performance Results (Fair Benchmark) - 2 modules: 1.72x speedup (86% efficiency) - 4 modules: 3.16x speedup (79% efficiency) - 8 modules: 5.51x speedup (69% efficiency) - 4 heavy: 3.52x speedup (88% efficiency) - 8 heavy: 6.76x speedup (85% efficiency) ## Bug #1: Atomic Barrier Optimization (10-15% gain) **Before:** 16 sequential lock operations per frame (8 workers × 2 phases) - Phase 1: Lock each worker mutex to signal work - Phase 2: Lock each worker mutex to wait for completion **After:** 0 locks in hot path using atomic counters - Generation-based frame synchronization (atomic counter) - Spin-wait with atomic completion counter - memory_order_release/acquire for correct visibility **Changes:** - include/grove/ThreadedModuleSystem.h: - Added std::atomic<size_t> currentFrameGeneration - Added std::atomic<int> workersCompleted - Added sharedDeltaTime, sharedFrameCount (main thread writes only) - Removed per-worker flags (shouldProcess, processingComplete, etc.) - src/ThreadedModuleSystem.cpp: - processModules(): Atomic generation increment + spin-wait - workerThreadLoop(): Wait on generation counter, no locks during processing ## Bug #2: Logger Mutex Contention (40-50% gain) **Problem:** All threads serialized on global logger mutex even with logging disabled - spdlog's multi-threaded sinks use internal mutexes - Every logger->trace/warn() call acquired mutex for level check **Fix:** Commented all logging calls in hot paths - src/ThreadedModuleSystem.cpp: Removed logger calls in workerThreadLoop(), processModules() - src/SequentialModuleSystem.cpp: Removed logger calls in processModules() (fair comparison) ## Bug #3: Benchmark Invalidity Fix **Problem:** SequentialModuleSystem only keeps 1 module (replaces on register) - Sequential: 1 module × 100k iterations - Threaded: 8 modules × 100k iterations (8× more work!) - Comparison was completely unfair **Fix:** Adjusted workload to be equal - Sequential: 1 module × (N × iterations) - Threaded: N modules × iterations - Total work now identical **Added:** - tests/benchmarks/benchmark_threaded_vs_sequential_cpu.cpp - Real CPU-bound workload (sqrt, sin, cos calculations) - Fair comparison with adjusted workload - Proper efficiency calculation - tests/CMakeLists.txt: Added benchmark target ## Technical Details **Memory Ordering:** - memory_order_release when writing flags (main thread signals workers) - memory_order_acquire when reading flags (workers see shared data) - Ensures proper synchronization without locks **Generation Counter:** - Prevents double-processing of frames - Workers track lastProcessedGeneration - Only process when currentGeneration > lastProcessed ## Impact ThreadedModuleSystem now achieves near-linear scaling for CPU-bound workloads. Ready for production use with 2-8 modules. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> |
|||
| aefd7921b2 |
fix: Critical race conditions in ThreadedModuleSystem and logger
Fixed two critical race conditions that prevented multi-threaded module execution: ## Bug #1: ThreadedModuleSystem::registerModule() race condition **Symptom:** Deadlock on first processModules() call **Root Cause:** Worker thread started before being added to workers vector **Fix:** Add worker to vector BEFORE spawning thread (src/ThreadedModuleSystem.cpp:102-108) Before: - Create worker → Start thread → Add to vector (RACE!) - Thread accesses workers[index] before push_back completes After: - Create worker → Add to vector → Start thread (SAFE) - Thread guaranteed to find worker in vector ## Bug #2: stillhammer::createLogger() race condition **Symptom:** Deadlock when multiple threads create loggers simultaneously **Root Cause:** Check-then-register pattern without mutex protection **Fix:** Added static mutex around spdlog::get() + register_logger() (external/StillHammer/logger/src/Logger.cpp:94-96) Before: - Thread 1: check → create → register - Thread 2: check → create → register (RACE on spdlog registry!) After: - Mutex protects entire check-then-register critical section ## Validation & Testing Added comprehensive test suite: - test_threaded_module_system.cpp (6 unit tests) - test_threaded_stress.cpp (5 stress tests: 50 modules × 1000 frames) - test_logger_threadsafe.cpp (concurrent logger creation) - benchmark_threaded_vs_sequential.cpp (performance comparison) - docs/THREADED_MODULE_SYSTEM_VALIDATION.md (full validation report) All tests passing (100%): - ThreadedModuleSystem: ✅ 0.15s - ThreadedStress: ✅ 7.64s - LoggerThreadSafe: ✅ 0.13s ## Impact ThreadedModuleSystem now PRODUCTION READY: - Thread-safe module registration - Stable parallel execution (validated with 50,000+ operations) - Hot-reload working (100 cycles tested) - Logger thread-safe for concurrent module initialization Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> |