GroveEngine/include/grove
StillHammer c727873046 perf(ThreadedModuleSystem): Atomic barrier + fair benchmark - 1.7x to 6.8x speedup
Critical performance fixes for ThreadedModuleSystem achieving 69-88% parallel efficiency.

## Performance Results (Fair Benchmark)

- 2 modules:  1.72x speedup (86% efficiency)
- 4 modules:  3.16x speedup (79% efficiency)
- 8 modules:  5.51x speedup (69% efficiency)
- 4 heavy:    3.52x speedup (88% efficiency)
- 8 heavy:    6.76x speedup (85% efficiency)

## Bug #1: Atomic Barrier Optimization (10-15% gain)

**Before:** 16 sequential lock operations per frame (8 workers × 2 phases)
- Phase 1: Lock each worker mutex to signal work
- Phase 2: Lock each worker mutex to wait for completion

**After:** 0 locks in hot path using atomic counters
- Generation-based frame synchronization (atomic counter)
- Spin-wait with atomic completion counter
- memory_order_release/acquire for correct visibility

**Changes:**
- include/grove/ThreadedModuleSystem.h:
  - Added std::atomic<size_t> currentFrameGeneration
  - Added std::atomic<int> workersCompleted
  - Added sharedDeltaTime, sharedFrameCount (main thread writes only)
  - Removed per-worker flags (shouldProcess, processingComplete, etc.)
- src/ThreadedModuleSystem.cpp:
  - processModules(): Atomic generation increment + spin-wait
  - workerThreadLoop(): Wait on generation counter, no locks during processing

## Bug #2: Logger Mutex Contention (40-50% gain)

**Problem:** All threads serialized on global logger mutex even with logging disabled
- spdlog's multi-threaded sinks use internal mutexes
- Every logger->trace/warn() call acquired mutex for level check

**Fix:** Commented all logging calls in hot paths
- src/ThreadedModuleSystem.cpp: Removed logger calls in workerThreadLoop(), processModules()
- src/SequentialModuleSystem.cpp: Removed logger calls in processModules() (fair comparison)

## Bug #3: Benchmark Invalidity Fix

**Problem:** SequentialModuleSystem only keeps 1 module (replaces on register)
- Sequential: 1 module × 100k iterations
- Threaded: 8 modules × 100k iterations (8× more work!)
- Comparison was completely unfair

**Fix:** Adjusted workload to be equal
- Sequential: 1 module × (N × iterations)
- Threaded: N modules × iterations
- Total work now identical

**Added:**
- tests/benchmarks/benchmark_threaded_vs_sequential_cpu.cpp
  - Real CPU-bound workload (sqrt, sin, cos calculations)
  - Fair comparison with adjusted workload
  - Proper efficiency calculation
- tests/CMakeLists.txt: Added benchmark target

## Technical Details

**Memory Ordering:**
- memory_order_release when writing flags (main thread signals workers)
- memory_order_acquire when reading flags (workers see shared data)
- Ensures proper synchronization without locks

**Generation Counter:**
- Prevents double-processing of frames
- Workers track lastProcessedGeneration
- Only process when currentGeneration > lastProcessed

## Impact

ThreadedModuleSystem now achieves near-linear scaling for CPU-bound workloads.
Ready for production use with 2-8 modules.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-19 15:49:10 +07:00
..
platform fix: Resolve bgfx Frame 1 crash on Windows DLL + MinGW GCC 15 compatibility 2025-12-30 11:03:06 +07:00
ASerializable.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
DataTreeFactory.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
DebugEngine.h feat: Add comprehensive hot-reload test suite with 3 integration scenarios 2025-11-13 22:13:07 +08:00
EngineFactory.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
ICoordinationModule.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
IDataNode.h feat: Complete UIModule Phase 7 - ScrollPanel & Tooltips 2025-11-29 07:13:13 +08:00
IDataTree.h feat: Add read-only API for concurrent DataNode access & restore test_13 cross-system tests 2025-11-20 14:02:06 +08:00
IDataValue.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
IEngine.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
IIO.h feat(IIO)!: BREAKING CHANGE - Callback-based message dispatch 2026-01-19 14:19:27 +07:00
ImGuiUI.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
IModule.h feat: Add integration tests 8-10 & fix CTest configuration 2025-11-19 07:34:15 +08:00
IModuleSystem.h feat: Add comprehensive hot-reload test suite with 3 integration scenarios 2025-11-13 22:13:07 +08:00
IntraIO.h feat(IIO)!: BREAKING CHANGE - Callback-based message dispatch 2026-01-19 14:19:27 +07:00
IntraIOManager.h fix: Resolve deadlock in IntraIOManager + cleanup SEGFAULTs 2025-11-23 11:36:33 +08:00
IOFactory.h feat: Complete migration from json to IDataNode API 2025-10-30 07:17:06 +08:00
IRegion.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
ISerializable.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
ITaskScheduler.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
IUI_Enums.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
IUI.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
JsonDataNode.h feat: Complete UIModule Phase 7 - ScrollPanel & Tooltips 2025-11-29 07:13:13 +08:00
JsonDataTree.h feat: Add read-only API for concurrent DataNode access & restore test_13 cross-system tests 2025-11-20 14:02:06 +08:00
JsonDataValue.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
ModuleFactory.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
ModuleLoader.h feat: Windows portage + Phase 4 SceneCollector integration 2025-11-27 09:48:14 +08:00
ModuleSystemFactory.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
RandomGenerator.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
Resource.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
ResourceRegistry.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
SequentialModuleSystem.h feat: Add comprehensive hot-reload test suite with 3 integration scenarios 2025-11-13 22:13:07 +08:00
SerializationRegistry.h feat: Implement complete IDataNode/IDataTree system with JSON backend 2025-10-28 15:36:25 +08:00
ThreadedModuleSystem.h perf(ThreadedModuleSystem): Atomic barrier + fair benchmark - 1.7x to 6.8x speedup 2026-01-19 15:49:10 +07:00