StillHammer 572e133f4e docs: Consolidate all plans into docs/plans/ directory

- Create new docs/plans/ directory with organized structure
- Add comprehensive PLAN_deadlock_detection_prevention.md (15h plan)
  - ThreadSanitizer integration (2h)
  - Helgrind validation (3h)
  - std::scoped_lock refactoring (4h)
  - std::shared_mutex optimization (6h)
- Migrate 16 plans from planTI/ to docs/plans/
  - Rename all files to PLAN_*.md convention
  - Update README.md with index and statuses
- Remove old planTI/ directory
- Add run_all_tests.sh script for test automation

Plans now include:
- 1 active development plan (deadlock prevention)
- 3 test architecture plans
- 13 integration test scenario plans

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-21 19:32:33 +08:00

29 KiB

Raw Blame History

Plan : Détection & Prévention Deadlock - GroveEngine

Date : 2025-01-21 Durée totale : 15h sur 2 semaines Objectif : Implémenter 4 aspects de protection anti-deadlock

📋 Vue d'Ensemble

Phase	Aspect	Durée	Difficulté	Priorité	Livrable
1.1	ThreadSanitizer	2h	⭐ Easy	🔥 Critique	TSan activé + tests clean
1.2	Helgrind	3h	⭐⭐ Medium	🔶 Important	Validation croisée
2	std::scoped_lock	4h	⭐⭐ Medium	🔥 Critique	Prévention deadlock
3	std::shared_mutex	6h	⭐⭐⭐ Hard	🔶 Optim	Perf concurrent +50-400%

Phase 1 : Détection Runtime (Semaine 1 - 5h)

Phase 1.1 : ThreadSanitizer (TSan) - Jour 1-2 (2h)

Objectif : Détection automatique des deadlocks potentiels et réels

Modifications CMakeLists.txt

Fichier : CMakeLists.txt (après project())

# ============================================================================
# Sanitizers for Testing
# ============================================================================
option(GROVE_ENABLE_TSAN "Enable ThreadSanitizer" OFF)

if(GROVE_ENABLE_TSAN)
    message(STATUS "🔍 ThreadSanitizer enabled (5-15x slowdown expected)")
    add_compile_options(-fsanitize=thread -g -O1 -fno-omit-frame-pointer)
    add_link_options(-fsanitize=thread)

    # Disable optimizations that confuse TSan
    add_compile_options(-fno-optimize-sibling-calls)

    message(WARNING "⚠️  TSan cannot be combined with ASan - build separately")
endif()

Tests

# Build avec TSan
cmake -DGROVE_ENABLE_TSAN=ON -B build-tsan
cmake --build build-tsan

# Run tous les tests
cd build-tsan
TSAN_OPTIONS="detect_deadlocks=1 history_size=7 exitcode=1" ctest -V

# Run un test spécifique
TSAN_OPTIONS="detect_deadlocks=1 second_deadlock_stack=1" ./tests/test_13_cross_system

# Avec logging détaillé
TSAN_OPTIONS="detect_deadlocks=1 log_path=tsan.log verbosity=2" ctest

Options TSan utiles

# Dans TSAN_OPTIONS (séparés par espaces)

detect_deadlocks=1          # Activer détection deadlock
history_size=7              # Taille historique (défaut=2, max=7)
second_deadlock_stack=1     # Afficher 2e stack trace
exitcode=1                  # Exit code si erreur détectée
halt_on_error=0             # Continuer après 1ère erreur
log_path=tsan.log           # Fichier de log (au lieu de stderr)
verbosity=2                 # Niveau de détail (0-2)

Exemple de sortie TSan

==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock)
  Cycle in lock order graph: M1 (0x7b0c00001000) => M2 (0x7b0c00002000) => M1

  Mutex M1 acquired here while holding mutex M2:
    #0 pthread_mutex_lock
    #1 std::mutex::lock()
    #2 IntraIOManager::routeMessage() src/IntraIOManager.cpp:176

  Mutex M2 previously acquired here while holding mutex M1:
    #0 pthread_mutex_lock
    #1 std::mutex::lock()
    #2 IntraIOManager::flushBatch() src/IntraIOManager.cpp:221
==================

Livrables Phase 1.1

CMakeLists.txt modifié avec option GROVE_ENABLE_TSAN
Documentation des options TSan
Tous les tests passent sans warnings TSan
CI/CD optionnellement intègre build TSan (peut être lent)

Effort estimé : 2h (1h setup + 1h fix issues)

Phase 1.2 : Helgrind - Jour 3-4 (3h)

Objectif : Double-vérification avec un détecteur alternatif

Modifications CMakeLists.txt

Fichier : CMakeLists.txt

# ============================================================================
# Helgrind (Valgrind) Integration
# ============================================================================
option(GROVE_ENABLE_HELGRIND "Add Helgrind test target" OFF)

if(GROVE_ENABLE_HELGRIND)
    find_program(VALGRIND_EXECUTABLE valgrind)

    if(VALGRIND_EXECUTABLE)
        message(STATUS "✅ Valgrind found: ${VALGRIND_EXECUTABLE}")

        # Add custom target for all tests
        add_custom_target(helgrind
            COMMAND ${CMAKE_COMMAND} -E echo "🔍 Running Helgrind (10-50x slowdown, be patient)..."
            COMMAND ${VALGRIND_EXECUTABLE}
                --tool=helgrind
                --log-file=${CMAKE_BINARY_DIR}/helgrind-full.log
                --suppressions=${CMAKE_SOURCE_DIR}/helgrind.supp
                --error-exitcode=1
                --read-var-info=yes
                ${CMAKE_CTEST_COMMAND} --output-on-failure --timeout 600
            WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
            COMMENT "Running all tests with Helgrind deadlock detector"
        )

        # Add convenience target for single test
        add_custom_target(helgrind-single
            COMMAND ${CMAKE_COMMAND} -E echo "🔍 Running single test with Helgrind..."
            COMMAND ${VALGRIND_EXECUTABLE}
                --tool=helgrind
                -v
                --log-file=${CMAKE_BINARY_DIR}/helgrind-single.log
                --suppressions=${CMAKE_SOURCE_DIR}/helgrind.supp
                --error-exitcode=1
                --read-var-info=yes
                ./tests/test_13_cross_system
            WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
            COMMENT "Running test_13_cross_system with Helgrind"
        )

        message(STATUS "✅ Helgrind targets added:")
        message(STATUS "   - make helgrind        (all tests)")
        message(STATUS "   - make helgrind-single (test_13 only)")
    else()
        message(WARNING "⚠️  Valgrind not found - Helgrind targets disabled")
        message(STATUS "   Install: sudo apt-get install valgrind")
    endif()
endif()

Fichier de suppressions

Fichier : helgrind.supp

# helgrind.supp - Suppress known false positives
# Format: https://valgrind.org/docs/manual/manual-core.html#manual-core.suppress

# spdlog false positives (lazy initialization)
{
   spdlog_registry_instance
   Helgrind:Race
   fun:*spdlog*registry*instance*
}

{
   spdlog_logger_creation
   Helgrind:Race
   ...
   fun:*spdlog*
}

# std::thread false positives
{
   std_thread_detach
   Helgrind:Race
   fun:*std*thread*
}

# C++ static initialization race (benign)
{
   static_initialization_guard
   Helgrind:Race
   fun:__cxa_guard_acquire
}

# Helgrind doesn't understand std::atomic properly
{
   atomic_load
   Helgrind:Race
   fun:*atomic*load*
}

{
   atomic_store
   Helgrind:Race
   fun:*atomic*store*
}

Tests

# Build avec Helgrind target
cmake -DGROVE_ENABLE_HELGRIND=ON -B build
cmake --build build

# Run all tests avec Helgrind (TRÈS LENT - 10-50x slowdown)
cd build
make helgrind

# Run un seul test (plus rapide pour debug)
make helgrind-single

# Voir les résultats
cat helgrind-full.log | grep -E "(Possible|Thread|ERROR|definitely)" | less

# Compter les problèmes
cat helgrind-full.log | grep "Possible data race" | wc -l
cat helgrind-full.log | grep "lock order" | wc -l

Exemple de sortie Helgrind

==12345== Helgrind, a thread error detector
==12345== Using Valgrind-3.18.1

Thread #1: lock order "0x123456 before 0x789abc" violated
Thread #2: lock order "0x789abc before 0x123456" violated

   Expected order: 0x123456 before 0x789abc

   at 0x4E4B123: pthread_mutex_lock (in /lib/libpthread.so)
   by 0x401234: std::mutex::lock() (mutex:123)
   by 0x402345: IntraIOManager::routeMessage() (IntraIOManager.cpp:176)

Comparaison TSan vs Helgrind

Aspect	ThreadSanitizer	Helgrind
Overhead	5-15x	10-50x
Détection	Lock order + races	Lock order + races
Précision	Très bonne	Bonne (plus de FP)
Intégration	Compile-time	Runtime (externe)
Plateformes	Linux, macOS	Linux, macOS
Facilité	⭐⭐⭐⭐⭐	⭐⭐⭐
Recommandation	PRIMARY	Validation croisée

Livrables Phase 1.2

CMakeLists.txt avec targets Helgrind
Fichier helgrind.supp avec suppressions
Documentation d'usage (ce fichier)
Tableau comparatif TSan vs Helgrind rempli
Tests passent avec Helgrind

Effort estimé : 3h (1h setup + 2h suppressions + comparaison)

Phase 2 : Prévention Compile-time (Semaine 2 - 4h)

std::scoped_lock - Jour 5-7

Objectif : Prévenir deadlocks lors de l'acquisition de plusieurs mutexes

Analyse préalable

Rechercher tous les endroits avec multiple locks :

# Recherche pattern : lock_guard successifs
cd src
grep -n "lock_guard" *.cpp | sort -t: -k1,1

# Afficher contexte (2 lignes avant/après)
grep -B2 -A2 "lock_guard" IntraIOManager.cpp

Modifications identifiées

1. IntraIOManager.cpp - Ligne 176

Avant (risque de deadlock) :

void IntraIOManager::routeMessage(const std::string& sourceId,
                                   const std::string& topic,
                                   const json& messageData) {
    std::lock_guard<std::mutex> lock(managerMutex);

    // ... code ...

    // Plus loin dans le même scope
    std::lock_guard<std::mutex> batchLock(batchMutex);  // ❌ Ordre variable

    // Access batch buffer
}

Après (deadlock-proof) :

void IntraIOManager::routeMessage(const std::string& sourceId,
                                   const std::string& topic,
                                   const json& messageData) {
    std::scoped_lock lock(managerMutex, batchMutex);  // ✅ Ordre garanti

    // ... code ...
    // Accès safe aux deux ressources
}

2. IntraIOManager.cpp - Ligne 221

Avant :

void IntraIOManager::someOtherFunction() {
    std::lock_guard<std::mutex> lock(managerMutex);
    // ...
    std::lock_guard<std::mutex> batchLock(batchMutex);
}

Après :

void IntraIOManager::someOtherFunction() {
    std::scoped_lock lock(managerMutex, batchMutex);
    // ...
}

3. IntraIOManager.cpp - Lignes 256, 272, 329

Même pattern : remplacer par scoped_lock.

Test unitaire de validation

Fichier : tests/unit/test_scoped_lock.cpp

#define CATCH_CONFIG_MAIN
#include <catch2/catch_test_macros.hpp>
#include <mutex>
#include <thread>
#include <atomic>
#include <vector>

TEST_CASE("scoped_lock prevents deadlock") {
    std::mutex m1, m2;
    std::atomic<int> counter{0};
    std::atomic<bool> deadlocked{false};

    // Thread 1: lock m1 then m2
    std::thread t1([&]() {
        for (int i = 0; i < 10000; i++) {
            std::scoped_lock lock(m1, m2);  // Order: m1, m2
            counter++;
            std::this_thread::yield();
        }
    });

    // Thread 2: lock m2 then m1 (INVERSE ORDER)
    std::thread t2([&]() {
        for (int i = 0; i < 10000; i++) {
            std::scoped_lock lock(m2, m1);  // Order: m2, m1 - Still safe!
            counter++;
            std::this_thread::yield();
        }
    });

    // Watchdog thread
    std::thread watchdog([&]() {
        for (int i = 0; i < 50; i++) {
            std::this_thread::sleep_for(std::chrono::milliseconds(100));
            if (counter == 20000) {
                return;  // Success
            }
        }
        deadlocked = true;
    });

    t1.join();
    t2.join();
    watchdog.join();

    REQUIRE_FALSE(deadlocked);
    REQUIRE(counter == 20000);
}

TEST_CASE("scoped_lock vs lock_guard - demonstrate issue") {
    std::mutex m1, m2;

    SECTION("scoped_lock - safe") {
        std::thread t1([&]() {
            std::scoped_lock lock(m1, m2);
        });
        std::thread t2([&]() {
            std::scoped_lock lock(m2, m1);  // Inverse order - SAFE
        });

        t1.join();
        t2.join();
        // No deadlock
        REQUIRE(true);
    }

    // NOTE: Cannot easily test lock_guard deadlock without actual deadlock
    // This is why we use TSan/Helgrind instead
}

Documentation - Patterns à éviter

Fichier : docs/coding_guidelines.md

## Synchronization Guidelines

### ✅ DO: Use std::scoped_lock for multiple mutexes

```cpp
void function() {
    std::scoped_lock lock(mutex1, mutex2, mutex3);
    // Safe - lock order guaranteed by implementation
}

❌ DON'T: Use std::lock_guard for multiple mutexes

void function() {
    std::lock_guard<std::mutex> lock1(mutex1);  // BAD
    std::lock_guard<std::mutex> lock2(mutex2);  // DEADLOCK RISK
    // If another thread locks in reverse order -> deadlock
}

✅ DO: Use std::unique_lock with std::lock if you need unlock

void function() {
    std::unique_lock<std::mutex> lock1(mutex1, std::defer_lock);
    std::unique_lock<std::mutex> lock2(mutex2, std::defer_lock);
    std::lock(lock1, lock2);  // Safe deadlock-free acquisition

    // ... do work ...

    // Can unlock early if needed
    lock1.unlock();
}


#### Checklist de refactoring

**Fichiers à modifier** :

- [x] `src/IntraIOManager.cpp` lignes 176, 221, 256, 272, 329
- [ ] `src/IntraIO.cpp` (vérifier si applicable)
- [ ] `src/JsonDataTree.cpp` (vérifier si applicable)
- [ ] `src/ModuleLoader.cpp` (vérifier si applicable)

**Tests** :

- [x] `tests/unit/test_scoped_lock.cpp` créé
- [x] Tous les tests d'intégration passent
- [x] TSan validation (pas de lock-order-inversion)
- [x] Helgrind validation

#### Livrables Phase 2

- [x] Toutes les acquisitions multi-mutex utilisent `std::scoped_lock`
- [x] Test unitaire démontrant la prévention
- [x] Documentation patterns à éviter (coding_guidelines.md)
- [x] Checklist de refactoring complétée

**Effort estimé** : 4h (2h recherche + 1h refactor + 1h tests)

---

## Phase 3 : Optimisation Concurrence (Semaine 2-3 - 6h)

### std::shared_mutex - Jour 8-10

**Objectif** : Permettre lectures concurrentes sans blocage

#### Analyse Read/Write Ratio

Avant de modifier, vérifier que c'est pertinent :

```bash
# Analyser les logs pour voir ratio read/write
cd build
./tests/test_11_io_system 2>&1 | grep "findSubscribers" | wc -l  # READS
./tests/test_11_io_system 2>&1 | grep "registerSubscriber" | wc -l  # WRITES

# Expected ratio: >100:1 (read-heavy)

Modification 1 : TopicTree.h

Fichier : external/StillHammer/topictree/include/topictree/TopicTree.h

Ligne 56 - Avant :

mutable std::mutex treeMutex;  // Read-write would be better but keep simple

Ligne 56 - Après :

mutable std::shared_mutex treeMutex;  // ✅ Reader-writer lock for concurrent reads

Ligne 222 - registerSubscriber() - WRITE :

void registerSubscriber(const std::string& pattern, const SubscriberType& subscriber) {
    auto segments = splitTopic(pattern);

    std::unique_lock lock(treeMutex);  // ✅ Exclusive lock for write
    insertPattern(&root, segments, 0, subscriber);
}

Ligne 234 - findSubscribers() - READ :

std::vector<SubscriberType> findSubscribers(const std::string& topic) const {
    auto segments = splitTopic(topic);
    std::unordered_set<SubscriberType> matches;

    std::shared_lock lock(treeMutex);  // ✅ Shared lock - concurrent reads!
    findMatches(&root, segments, 0, matches);

    return std::vector<SubscriberType>(matches.begin(), matches.end());
}

Ligne 253 - unregisterSubscriber() - WRITE :

void unregisterSubscriber(const std::string& pattern, const SubscriberType& subscriber) {
    auto segments = splitTopic(pattern);

    std::unique_lock lock(treeMutex);  // ✅ Exclusive lock for write
    removeSubscriberFromNode(&root, segments, 0, subscriber);
}

Ligne 266 - unregisterSubscriberAll() - WRITE :

void unregisterSubscriberAll(const SubscriberType& subscriber) {
    std::unique_lock lock(treeMutex);  // ✅ Exclusive lock for write
    unregisterSubscriberAllRecursive(&root, subscriber);
}

Ligne 274 - clear() - WRITE :

void clear() {
    std::unique_lock lock(treeMutex);  // ✅ Exclusive lock for write
    root = Node();
}

Ligne 282 - subscriberCount() - READ :

size_t subscriberCount() const {
    std::shared_lock lock(treeMutex);  // ✅ Shared lock - concurrent reads
    return countSubscribersRecursive(&root);
}

Modification 2 : IntraIOManager

Fichier : include/grove/IntraIOManager.h

Avant :

class IntraIOManager {
private:
    mutable std::mutex managerMutex;
    std::mutex batchMutex;
};

Après :

class IntraIOManager {
private:
    // Split into two mutexes with clear roles
    mutable std::shared_mutex instancesMutex;  // For instances map (read-heavy)
    mutable std::mutex statsMutex;             // For stats counters (simple)
    std::mutex batchMutex;                     // For batch operations (keep as-is)
};

Fichier : src/IntraIOManager.cpp

getInstance() - READ :

std::shared_ptr<IntraIO> IntraIOManager::getInstance(const std::string& instanceId) const {
    std::shared_lock lock(instancesMutex);  // ✅ Concurrent reads

    auto it = instances.find(instanceId);
    if (it != instances.end()) {
        return std::static_pointer_cast<IntraIO>(it->second);
    }
    return nullptr;
}

routeMessage() - READ (CRITICAL!) :

void IntraIOManager::routeMessage(const std::string& sourceId,
                                   const std::string& topic,
                                   const json& messageData) {
    // Update stats - separate mutex
    {
        std::lock_guard<std::mutex> statsLock(statsMutex);
        totalRoutedMessages++;
        messagesSinceLastLog++;
    }

    // Instance lookup + routing - shared read lock
    std::vector<std::string> subscribers;
    {
        std::shared_lock lock(instancesMutex);  // ✅ Multiple threads route concurrently!
        subscribers = topicTree.findSubscribers(topic);
    }

    // Deliver to subscribers
    size_t deliveredCount = 0;
    for (const auto& subscriberId : subscribers) {
        std::shared_ptr<IIntraIODelivery> subscriber;
        {
            std::shared_lock lock(instancesMutex);  // ✅ Concurrent lookup
            auto it = instances.find(subscriberId);
            if (it != instances.end()) {
                subscriber = it->second;
            }
        }

        if (subscriber && subscriberId != sourceId) {
            // Deliver message (outside lock)
            // ...
            deliveredCount++;
        }
    }

    // Update stats
    {
        std::lock_guard<std::mutex> statsLock(statsMutex);
        totalDeliveredMessages += deliveredCount;
    }
}

createInstance() - WRITE :

std::shared_ptr<IntraIO> IntraIOManager::createInstance(const std::string& instanceId) {
    std::unique_lock lock(instancesMutex);  // ✅ Exclusive write lock

    auto it = instances.find(instanceId);
    if (it != instances.end()) {
        return std::static_pointer_cast<IntraIO>(it->second);
    }

    auto instance = createIntraIOInstance(instanceId);
    instances[instanceId] = instance;
    return instance;
}

removeInstance() - WRITE :

void IntraIOManager::removeInstance(const std::string& instanceId) {
    std::unique_lock lock(instancesMutex);  // ✅ Exclusive write lock

    auto it = instances.find(instanceId);
    if (it == instances.end()) {
        return;
    }

    topicTree.unregisterSubscriberAll(instanceId);
    instancePatterns.erase(instanceId);
    instances.erase(it);
}

Modification 3 : JsonDataTree (Optionnel)

Fichier : include/grove/JsonDataTree.h

class JsonDataTree : public IDataTree {
private:
    mutable std::shared_mutex treeMutex;  // ✅ Instead of std::mutex
    // ...
};

Fichier : src/JsonDataTree.cpp

READS - Concurrent :

std::unique_ptr<IDataNode> JsonDataTree::getConfigRoot() {
    std::shared_lock lock(treeMutex);  // ✅ Concurrent reads

    auto configNode = m_root->getFirstChildByName("config");
    if (!configNode) {
        return nullptr;
    }
    auto* jsonNode = static_cast<JsonDataNode*>(configNode);
    return std::make_unique<JsonDataNode>(jsonNode->getName(),
                                          jsonNode->getJsonData(),
                                          nullptr,
                                          true);
}

std::unique_ptr<IDataNode> JsonDataTree::getDataRoot() {
    std::shared_lock lock(treeMutex);  // ✅ Concurrent reads
    // ...
}

IDataNode* JsonDataTree::getDataRootReadOnly() {
    std::shared_lock lock(treeMutex);  // ✅ Concurrent reads
    return m_root->getFirstChildByName("data");
}

WRITES - Exclusive :

bool JsonDataTree::loadConfigFile(const std::string& filename) {
    std::unique_lock lock(treeMutex);  // ✅ Exclusive write
    // ...
}

bool JsonDataTree::reloadIfChanged() {
    std::unique_lock lock(treeMutex);  // ✅ Exclusive write
    // ...
}

Benchmark Performance

Fichier : tests/benchmarks/benchmark_shared_mutex.cpp

#include <benchmark/benchmark.h>
#include <mutex>
#include <shared_mutex>
#include <thread>

// Mock TopicTree avec std::mutex
class TopicTreeMutex {
    std::mutex mtx;
    std::vector<std::string> data;
public:
    void findSubscribers() {
        std::lock_guard<std::mutex> lock(mtx);
        volatile auto size = data.size();  // Simulate work
    }
};

// Mock TopicTree avec std::shared_mutex
class TopicTreeSharedMutex {
    std::shared_mutex mtx;
    std::vector<std::string> data;
public:
    void findSubscribers() const {
        std::shared_lock lock(mtx);
        volatile auto size = data.size();  // Simulate work
    }
};

static void BM_Mutex_SingleThread(benchmark::State& state) {
    TopicTreeMutex tree;
    for (auto _ : state) {
        tree.findSubscribers();
    }
}

static void BM_SharedMutex_SingleThread(benchmark::State& state) {
    TopicTreeSharedMutex tree;
    for (auto _ : state) {
        tree.findSubscribers();
    }
}

static void BM_Mutex_MultiThread(benchmark::State& state) {
    static TopicTreeMutex tree;
    for (auto _ : state) {
        tree.findSubscribers();
    }
}

static void BM_SharedMutex_MultiThread(benchmark::State& state) {
    static TopicTreeSharedMutex tree;
    for (auto _ : state) {
        tree.findSubscribers();
    }
}

BENCHMARK(BM_Mutex_SingleThread);
BENCHMARK(BM_SharedMutex_SingleThread);
BENCHMARK(BM_Mutex_MultiThread)->Threads(1)->Threads(2)->Threads(4)->Threads(8);
BENCHMARK(BM_SharedMutex_MultiThread)->Threads(1)->Threads(2)->Threads(4)->Threads(8);

BENCHMARK_MAIN();

Ajouter au CMakeLists.txt :

# Benchmark shared_mutex
add_executable(benchmark_shared_mutex
    benchmarks/benchmark_shared_mutex.cpp
)

target_link_libraries(benchmark_shared_mutex PRIVATE
    GroveEngine::core
    benchmark::benchmark
)

Run benchmark :

cmake --build build
./build/tests/benchmark_shared_mutex --benchmark_min_time=3s

Résultats attendus :

Benchmark                              Time           CPU   Iterations
-----------------------------------------------------------------------
BM_Mutex_SingleThread                 15 ns         15 ns     46000000
BM_SharedMutex_SingleThread           15 ns         15 ns     46000000  (same overhead)

BM_Mutex_MultiThread/threads:1        15 ns         15 ns     46000000
BM_Mutex_MultiThread/threads:2        60 ns        120 ns      5800000  (2x slower - serialized)
BM_Mutex_MultiThread/threads:4       240 ns        960 ns      1450000  (4x slower)
BM_Mutex_MultiThread/threads:8       960 ns       7680 ns       180000  (8x slower)

BM_SharedMutex_MultiThread/threads:1  15 ns         15 ns     46000000
BM_SharedMutex_MultiThread/threads:2  18 ns         36 ns     19000000  (CONCURRENT!)
BM_SharedMutex_MultiThread/threads:4  22 ns         88 ns      7900000  (4x faster than mutex)
BM_SharedMutex_MultiThread/threads:8  30 ns        240 ns      2900000  (32x faster than mutex!)

Tests de validation

# Test fonctionnel
cmake --build build
ctest --output-on-failure

# Test avec TSan (vérifier pas de data race)
cmake -DGROVE_ENABLE_TSAN=ON -B build-tsan
cmake --build build-tsan
cd build-tsan
TSAN_OPTIONS="detect_deadlocks=1" ctest -V

# Helgrind validation
cd build
make helgrind

# Benchmark performance
./tests/benchmark_shared_mutex --benchmark_min_time=3s

Rapport de performance

Template : docs/performance_reports/shared_mutex_results.md

# Shared Mutex Performance Report

**Date** : YYYY-MM-DD
**Branch** : feature/shared-mutex
**Commit** : abc123

## Modifications

- TopicTree: std::mutex → std::shared_mutex
- IntraIOManager: split managerMutex into instancesMutex (shared) + statsMutex (exclusive)
- JsonDataTree: std::mutex → std::shared_mutex (optionnel)

## Benchmark Results

### Single-threaded Performance

| Component | Before (ns) | After (ns) | Overhead |
|-----------|-------------|------------|----------|
| TopicTree::findSubscribers | 15.2 | 15.3 | +0.7% ✅ |
| IntraIOManager::getInstance | 8.5 | 8.6 | +1.2% ✅ |

**Conclusion** : Overhead négligeable en single-thread.

### Multi-threaded Performance (4 threads)

| Component | Before (ns) | After (ns) | Speedup |
|-----------|-------------|------------|---------|
| TopicTree::findSubscribers | 960 | 22 | **43.6x** 🚀 |
| IntraIOManager::getInstance | 480 | 18 | **26.7x** 🚀 |

### Multi-threaded Performance (8 threads)

| Component | Before (ns) | After (ns) | Speedup |
|-----------|-------------|------------|---------|
| TopicTree::findSubscribers | 7680 | 30 | **256x** 🚀 |
| IntraIOManager::getInstance | 3840 | 25 | **153x** 🚀 |

## Integration Test Results

| Test | Before (ms) | After (ms) | Improvement |
|------|-------------|------------|-------------|
| test_11_io_system | 1250 | 380 | **3.3x faster** |
| test_13_cross_system | 2100 | 620 | **3.4x faster** |

## Validation

- [x] Tous les tests fonctionnels passent
- [x] TSan : Aucun data race détecté
- [x] Helgrind : Aucun lock order violation
- [x] Benchmark : Gain significatif en multi-thread

## Conclusion

✅ **shared_mutex apporte un gain massif** (50-250x) pour les workloads read-heavy.
✅ Overhead négligeable en single-thread.
✅ Aucune régression fonctionnelle.

**Recommandation** : Merge dans main.

Livrables Phase 3

TopicTree.h utilise std::shared_mutex
IntraIOManager split mutex (instancesMutex shared + statsMutex exclusive)
JsonDataTree utilise std::shared_mutex (optionnel)
Benchmark créé et exécuté
Rapport de performance rempli (>50% speedup démontré)
Tests passent avec TSan (pas de data race)
Tests d'intégration passent

Effort estimé : 6h (2h TopicTree + 2h IntraIOManager + 1h JsonDataTree + 1h benchmarks)

Checklist Finale de Validation

Phase 1 : Détection Runtime

TSan :

CMakeLists.txt modifié avec GROVE_ENABLE_TSAN
Build TSan compile sans erreurs
Tous les tests passent avec TSAN_OPTIONS="detect_deadlocks=1"
Documentation TSan options

Helgrind :

CMakeLists.txt avec targets helgrind et helgrind-single
Fichier helgrind.supp créé avec suppressions
Tests passent avec Helgrind
Tableau comparatif TSan vs Helgrind documenté

Phase 2 : Prévention

scoped_lock :

Tous les lock_guard multi-mutex remplacés par scoped_lock
Test unitaire test_scoped_lock.cpp créé
Documentation coding_guidelines.md mise à jour
Tests d'intégration passent

Phase 3 : Optimisation

shared_mutex :

TopicTree.h modifié (shared_lock/unique_lock)
IntraIOManager.h/cpp modifiés (instancesMutex shared)
JsonDataTree.h/cpp modifiés (optionnel)
Benchmark benchmark_shared_mutex.cpp créé
Rapport de performance rempli (gain >50%)
TSan valide pas de data race
Tests d'intégration passent

Calendrier

Semaine	Jour	Phase	Durée	Cumul
1	Lun-Mar	TSan	2h	2h
1	Mer-Jeu	Helgrind	3h	5h
2	Lun-Mar	scoped_lock	4h	9h
2	Mer-Ven	shared_mutex	6h	15h

Prochaines Étapes (Hors Plan)

Une fois ce plan complété, considérer :

Clang Thread Safety Annotations (long-terme)
- Ajouter GUARDED_BY, REQUIRES, EXCLUDES
- Compile-time verification
Hierarchical Mutexes (si architecture complexe)
- Définir hiérarchie : Engine > ModuleSystem > Module > IO
- Runtime enforcement
Lock-free Structures (ultra-hot paths)
- TopicTree subscribers avec std::atomic
- Message queues lock-free

Auteur : Claude Code Version : 1.0 Dernière mise à jour : 2025-01-21

29 KiB Raw Blame History

Plan : Détection & Prévention Deadlock - GroveEngine

📋 Vue d'Ensemble

Phase 1 : Détection Runtime (Semaine 1 - 5h)

Phase 1.1 : ThreadSanitizer (TSan) - Jour 1-2 (2h)

Modifications CMakeLists.txt

Tests

Options TSan utiles

Exemple de sortie TSan

Livrables Phase 1.1

Phase 1.2 : Helgrind - Jour 3-4 (3h)

Modifications CMakeLists.txt

Fichier de suppressions

Tests

Exemple de sortie Helgrind

Comparaison TSan vs Helgrind

Livrables Phase 1.2

Phase 2 : Prévention Compile-time (Semaine 2 - 4h)

std::scoped_lock - Jour 5-7

Analyse préalable

Modifications identifiées

1. IntraIOManager.cpp - Ligne 176

2. IntraIOManager.cpp - Ligne 221

3. IntraIOManager.cpp - Lignes 256, 272, 329

Test unitaire de validation

Documentation - Patterns à éviter

❌ DON'T: Use std::lock_guard for multiple mutexes

✅ DO: Use std::unique_lock with std::lock if you need unlock

Modification 1 : TopicTree.h

Modification 2 : IntraIOManager

Modification 3 : JsonDataTree (Optionnel)

Benchmark Performance

Tests de validation

Rapport de performance

Livrables Phase 3

Checklist Finale de Validation

Phase 1 : Détection Runtime

Phase 2 : Prévention

Phase 3 : Optimisation

Calendrier

Prochaines Étapes (Hors Plan)

29 KiB

Raw Blame History