StillHammer 3915424d75 feat(wip): Phase 7.1 STT Service Layer - Architecture complète (ne compile pas)

Architecture Phase 7 STT implémentée mais bloquée par conflits de macros
entre GroveEngine (JsonDataNode.h) et spdlog/fmt.

## Nouveau contenu

### Interfaces & Services
- ISTTService.hpp: Interface service STT (modes passive/active, callbacks)
- STTService.{hpp,cpp}: Implémentation service STT avec factory pattern
- VoskSTTEngine.{hpp,cpp}: Engine STT local Vosk (~50MB model)

### Factory Pattern
- STTEngineFactory: Support multi-engines (Vosk, Whisper API, auto-select)
- Fallback automatique Vosk -> Whisper API

### Configuration
- config/voice.json: Config Phase 7 (passive_mode, active_mode, whisper_api)
- Support modèles Vosk locaux + fallback cloud

### Intégration
- VoiceService: Nouvelle méthode configureSTT(json) pour Phase 7
- main.cpp: Chargement config STT depuis voice.json
- CMakeLists.txt: Ajout fichiers + dépendance optionnelle Vosk

## Problème de Compilation

**Bloqué par conflits de macros**:
- JsonDataNode.h (GroveEngine) définit des macros qui polluent 'logger' et 'queue'
- Cause erreurs dans VoiceService.cpp et STTService.cpp
- Voir plans/PHASE7_COMPILATION_ISSUE.md pour diagnostic complet

## Fonctionnalités Implémentées

✅ Architecture STT complète (service layer + engines)
✅ Support Vosk local (modèles français)
✅ Factory pattern avec auto-selection
✅ Configuration JSON Phase 7
✅ Callbacks transcription/keywords
❌ Ne compile pas (macro conflicts)

## Prochaines Étapes

1. Résoudre conflits macros (fixer GroveEngine ou isolation namespace)
2. Phase 7.2: PocketSphinxEngine (keyword spotting "Celuna")
3. Tests intégration STT

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-29 09:01:26 +08:00

26 KiB

Raw Permalink Blame History

Phase 7 - Implémentation STT Modulaire

Date de création : 2025-11-29 Objectif : Architecture STT complète avec support multi-engines (Vosk, PocketSphinx, Whisper) Nom de l'assistant : Celuna (anciennement AISSIA)

Vue d'Ensemble

Objectifs

Architecture modulaire : Interface ISTTEngine avec 4 implémentations
Service STT : Layer ISTTService pour abstraction business logic
Dual Mode : Passive (keyword spotting) + Active (transcription complète)
Coût optimisé : Local par défaut, Whisper API en fallback optionnel

Architecture Cible

┌─────────────────────────────────────────────────────────┐
│                    VoiceService                          │
│  - Gère TTS (EspeakTTSEngine)                           │
│  - Gère STT via ISTTService                             │
│  - Pub/sub IIO (voice:speak, voice:listen, etc.)       │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                    ISTTService                           │
│  - Interface service STT                                │
│  - Gère mode passive/active                             │
│  - Switch engines selon config                          │
│  - Fallback automatique                                 │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                  STTEngineFactory                        │
│  - create(type, config) → unique_ptr<ISTTEngine>        │
└─────────────────────────────────────────────────────────┘
                           │
          ┌────────────────┼────────────────┬──────────────┐
          ▼                ▼                ▼              ▼
   ┌──────────┐   ┌──────────────┐  ┌─────────────┐  ┌──────────────┐
   │  Vosk    │   │ PocketSphinx │  │ WhisperCpp  │  │ WhisperAPI   │
   │  Engine  │   │   Engine     │  │   Engine    │  │   Engine     │
   └──────────┘   └──────────────┘  └─────────────┘  └──────────────┘
   Local          Local (keywords)   Local (précis)   Remote (payant)
   50MB model     Léger ~10MB        75-142MB          API OpenAI

Phase 7.1 - Service Layer (ISTTService)

Objectif

Créer une couche service qui abstrait la complexité des engines STT et gère :

Mode passive/active
Switching d'engines
Fallback automatique
Gestion erreurs

Fichiers à créer

1. `src/services/ISTTService.hpp`

Interface du service STT

#pragma once

#include <string>
#include <vector>
#include <functional>
#include <memory>

namespace aissia {

enum class STTMode {
    PASSIVE,  // Keyword spotting (économe)
    ACTIVE    // Full transcription
};

enum class STTEngineType {
    VOSK,
    POCKETSPHINX,
    WHISPER_CPP,
    WHISPER_API,
    AUTO  // Factory choisit
};

/**
 * @brief Callback pour résultats transcription
 */
using TranscriptionCallback = std::function<void(const std::string& text, STTMode mode)>;

/**
 * @brief Callback pour détection keyword
 */
using KeywordCallback = std::function<void(const std::string& keyword)>;

/**
 * @brief Interface service STT
 */
class ISTTService {
public:
    virtual ~ISTTService() = default;

    /**
     * @brief Démarre le service STT
     */
    virtual bool start() = 0;

    /**
     * @brief Arrête le service STT
     */
    virtual void stop() = 0;

    /**
     * @brief Change le mode STT
     */
    virtual void setMode(STTMode mode) = 0;

    /**
     * @brief Obtient le mode actuel
     */
    virtual STTMode getMode() const = 0;

    /**
     * @brief Transcrit un fichier audio
     */
    virtual std::string transcribeFile(const std::string& filePath) = 0;

    /**
     * @brief Transcrit des données audio PCM
     */
    virtual std::string transcribe(const std::vector<float>& audioData) = 0;

    /**
     * @brief Active l'écoute en streaming (temps réel)
     */
    virtual void startListening(TranscriptionCallback onTranscription,
                                KeywordCallback onKeyword) = 0;

    /**
     * @brief Arrête l'écoute streaming
     */
    virtual void stopListening() = 0;

    /**
     * @brief Configure la langue
     */
    virtual void setLanguage(const std::string& language) = 0;

    /**
     * @brief Vérifie si le service est disponible
     */
    virtual bool isAvailable() const = 0;

    /**
     * @brief Obtient le nom de l'engine actuel
     */
    virtual std::string getCurrentEngine() const = 0;
};

} // namespace aissia

Estimation : 50 lignes

2. `src/services/STTService.hpp` + `.cpp`

Implémentation du service STT

Features :

Gère 2 engines : 1 pour passive (PocketSphinx), 1 pour active (Vosk/Whisper)
Switch automatique passive → active sur keyword
Timeout active → passive (30s sans parole)
Fallback vers Whisper API si engine local fail
Thread d'écoute microphone (via PortAudio ou ALSA)

Pseudo-code :

class STTService : public ISTTService {
private:
    std::unique_ptr<ISTTEngine> m_passiveEngine;  // PocketSphinx
    std::unique_ptr<ISTTEngine> m_activeEngine;   // Vosk/Whisper
    std::unique_ptr<ISTTEngine> m_fallbackEngine; // WhisperAPI

    STTMode m_currentMode = STTMode::PASSIVE;

    std::thread m_listenThread;
    std::atomic<bool> m_listening{false};

    TranscriptionCallback m_onTranscription;
    KeywordCallback m_onKeyword;

    std::chrono::steady_clock::time_point m_lastActivity;

public:
    bool start() override {
        // Load engines from config
        m_passiveEngine = STTEngineFactory::create("pocketsphinx", config);
        m_activeEngine = STTEngineFactory::create("vosk", config);
        m_fallbackEngine = STTEngineFactory::create("whisper-api", config);

        return m_passiveEngine && m_activeEngine;
    }

    void startListening(TranscriptionCallback onTranscription,
                       KeywordCallback onKeyword) override {
        m_onTranscription = onTranscription;
        m_onKeyword = onKeyword;

        m_listening = true;
        m_listenThread = std::thread([this]() {
            listenLoop();
        });
    }

private:
    void listenLoop() {
        // Ouvrir microphone (PortAudio)
        // Boucle infinie :
        //   - Si PASSIVE : use m_passiveEngine (keywords only)
        //     - Si keyword détecté → setMode(ACTIVE) + callback
        //   - Si ACTIVE : use m_activeEngine (full transcription)
        //     - Transcrit en temps réel
        //     - Si timeout 30s → setMode(PASSIVE)
    }
};

Estimation : 300 lignes (service + thread microphone)

Phase 7.2 - Engines STT

Fichiers à modifier/créer

1. `src/shared/audio/ISTTEngine.hpp` ✅ Existe

Modifications : Aucune (interface déjà bonne)

2. `src/shared/audio/WhisperAPIEngine.hpp` ✅ Existe

Modifications : Aucune (déjà implémenté, sera utilisé comme fallback)

3. `src/shared/audio/VoskSTTEngine.hpp` 🆕 À créer

Vosk Speech Recognition

Dépendances :

vosk library (C++ bindings)
Modèle français : vosk-model-small-fr-0.22 (~50MB)

Installation :

# Linux
sudo apt install libvosk-dev

# Télécharger modèle FR
wget https://alphacephei.com/vosk/models/vosk-model-small-fr-0.22.zip
unzip vosk-model-small-fr-0.22.zip -d models/

Implémentation :

#pragma once

#include "ISTTEngine.hpp"
#include <vosk_api.h>
#include <spdlog/spdlog.h>

namespace aissia {

class VoskSTTEngine : public ISTTEngine {
public:
    explicit VoskSTTEngine(const std::string& modelPath) {
        m_logger = spdlog::get("VoskSTT");
        if (!m_logger) {
            m_logger = spdlog::stdout_color_mt("VoskSTT");
        }

        // Load Vosk model
        m_model = vosk_model_new(modelPath.c_str());
        if (!m_model) {
            m_logger->error("Failed to load Vosk model: {}", modelPath);
            m_available = false;
            return;
        }

        // Create recognizer (16kHz, mono)
        m_recognizer = vosk_recognizer_new(m_model, 16000.0);
        m_available = true;

        m_logger->info("Vosk STT initialized: {}", modelPath);
    }

    ~VoskSTTEngine() override {
        if (m_recognizer) vosk_recognizer_free(m_recognizer);
        if (m_model) vosk_model_free(m_model);
    }

    std::string transcribe(const std::vector<float>& audioData) override {
        if (!m_available || audioData.empty()) return "";

        // Convert float to int16
        std::vector<int16_t> samples(audioData.size());
        for (size_t i = 0; i < audioData.size(); ++i) {
            samples[i] = static_cast<int16_t>(audioData[i] * 32767.0f);
        }

        // Feed audio to recognizer
        vosk_recognizer_accept_waveform(m_recognizer,
            reinterpret_cast<const char*>(samples.data()),
            samples.size() * sizeof(int16_t));

        // Get final result
        const char* result = vosk_recognizer_final_result(m_recognizer);

        // Parse JSON result: {"text": "transcription"}
        std::string text = parseVoskResult(result);

        m_logger->debug("Transcribed: {}", text);
        return text;
    }

    std::string transcribeFile(const std::string& filePath) override {
        // Load WAV file, convert to PCM, call transcribe()
        // (Implementation omitted for brevity)
    }

    void setLanguage(const std::string& language) override {
        // Vosk model is language-specific, can't change at runtime
    }

    bool isAvailable() const override { return m_available; }
    std::string getEngineName() const override { return "vosk"; }

private:
    VoskModel* m_model = nullptr;
    VoskRecognizer* m_recognizer = nullptr;
    bool m_available = false;
    std::shared_ptr<spdlog::logger> m_logger;

    std::string parseVoskResult(const char* json) {
        // Parse JSON: {"text": "bonjour"} → "bonjour"
        // Use nlohmann::json
    }
};

} // namespace aissia

Estimation : 200 lignes

4. `src/shared/audio/PocketSphinxEngine.hpp` 🆕 À créer

PocketSphinx Keyword Spotting

Dépendances :

pocketsphinx library
Acoustic model (phonétique)

Installation :

sudo apt install pocketsphinx pocketsphinx-en-us

Configuration Keywords :

# keywords.txt
celuna /1e-40/
hey celuna /1e-50/

Implémentation :

#pragma once

#include "ISTTEngine.hpp"
#include <pocketsphinx.h>
#include <spdlog/spdlog.h>

namespace aissia {

class PocketSphinxEngine : public ISTTEngine {
public:
    explicit PocketSphinxEngine(const std::vector<std::string>& keywords,
                                 const std::string& modelPath) {
        m_logger = spdlog::get("PocketSphinx");
        if (!m_logger) {
            m_logger = spdlog::stdout_color_mt("PocketSphinx");
        }

        // Create keyword file
        createKeywordFile(keywords);

        // Initialize PocketSphinx
        ps_config_t* config = ps_config_init(NULL);
        ps_config_set_str(config, "hmm", modelPath.c_str());
        ps_config_set_str(config, "kws", "/tmp/celuna_keywords.txt");
        ps_config_set_float(config, "kws_threshold", 1e-40);

        m_decoder = ps_init(config);
        m_available = (m_decoder != nullptr);

        if (m_available) {
            m_logger->info("PocketSphinx initialized for keyword spotting");
        }
    }

    ~PocketSphinxEngine() override {
        if (m_decoder) ps_free(m_decoder);
    }

    std::string transcribe(const std::vector<float>& audioData) override {
        if (!m_available || audioData.empty()) return "";

        // Convert to int16
        std::vector<int16_t> samples(audioData.size());
        for (size_t i = 0; i < audioData.size(); ++i) {
            samples[i] = static_cast<int16_t>(audioData[i] * 32767.0f);
        }

        // Process audio
        ps_start_utt(m_decoder);
        ps_process_raw(m_decoder, samples.data(), samples.size(), FALSE, FALSE);
        ps_end_utt(m_decoder);

        // Get keyword (if detected)
        const char* hyp = ps_get_hyp(m_decoder, nullptr);
        std::string keyword = (hyp ? hyp : "");

        if (!keyword.empty()) {
            m_logger->info("Keyword detected: {}", keyword);
        }

        return keyword;
    }

    std::string transcribeFile(const std::string& filePath) override {
        // Not used for keyword spotting (streaming only)
        return "";
    }

    void setLanguage(const std::string& language) override {}
    bool isAvailable() const override { return m_available; }
    std::string getEngineName() const override { return "pocketsphinx"; }

private:
    ps_decoder_t* m_decoder = nullptr;
    bool m_available = false;
    std::shared_ptr<spdlog::logger> m_logger;

    void createKeywordFile(const std::vector<std::string>& keywords) {
        std::ofstream file("/tmp/celuna_keywords.txt");
        for (const auto& kw : keywords) {
            file << kw << " /1e-40/\n";
        }
    }
};

} // namespace aissia

Estimation : 180 lignes

5. `src/shared/audio/WhisperCppEngine.hpp` 🆕 À créer (OPTIONNEL)

whisper.cpp - Local Whisper

Dépendances :

whisper.cpp (ggerganov)
Modèle : ggml-tiny.bin (75MB) ou ggml-base.bin (142MB)

Installation :

git clone https://github.com/ggerganov/whisper.cpp external/whisper.cpp
cd external/whisper.cpp
make
./models/download-ggml-model.sh tiny

Implémentation : Similar à Vosk mais avec API whisper.cpp

Estimation : 250 lignes

⚠️ Note : Optionnel, à implémenter seulement si besoin haute précision locale

6. `src/shared/audio/STTEngineFactory.cpp` 📝 Modifier

Factory pattern pour créer engines

#include "STTEngineFactory.hpp"
#include "VoskSTTEngine.hpp"
#include "PocketSphinxEngine.hpp"
#include "WhisperCppEngine.hpp"
#include "WhisperAPIEngine.hpp"

namespace aissia {

std::unique_ptr<ISTTEngine> STTEngineFactory::create(
    const std::string& type,
    const nlohmann::json& config) {

    if (type == "vosk" || type == "auto") {
        std::string modelPath = config.value("model_path", "./models/vosk-model-small-fr-0.22");
        auto engine = std::make_unique<VoskSTTEngine>(modelPath);
        if (engine->isAvailable()) return engine;
    }

    if (type == "pocketsphinx") {
        std::vector<std::string> keywords = config.value("keywords", std::vector<std::string>{"celuna"});
        std::string modelPath = config.value("model_path", "/usr/share/pocketsphinx/model/en-us");
        auto engine = std::make_unique<PocketSphinxEngine>(keywords, modelPath);
        if (engine->isAvailable()) return engine;
    }

    if (type == "whisper-cpp") {
        std::string modelPath = config.value("model_path", "./models/ggml-tiny.bin");
        auto engine = std::make_unique<WhisperCppEngine>(modelPath);
        if (engine->isAvailable()) return engine;
    }

    if (type == "whisper-api") {
        std::string apiKey = std::getenv(config.value("api_key_env", "OPENAI_API_KEY").c_str());
        if (!apiKey.empty()) {
            return std::make_unique<WhisperAPIEngine>(apiKey);
        }
    }

    // Fallback: stub engine (no-op)
    return std::make_unique<StubSTTEngine>();
}

} // namespace aissia

Estimation : 80 lignes

Phase 7.3 - Intégration VoiceService

Fichier à modifier

`src/services/VoiceService.cpp`

Modifications :

Remplacer implémentation directe par ISTTService

Avant :

// VoiceService gère directement WhisperAPIEngine
std::unique_ptr<WhisperAPIEngine> m_sttEngine;

Après :

// VoiceService délègue à ISTTService
std::unique_ptr<ISTTService> m_sttService;

Initialisation :

void VoiceService::initialize(const nlohmann::json& config) {
    // TTS (unchanged)
    m_ttsEngine = TTSEngineFactory::create();

    // STT (new)
    m_sttService = std::make_unique<STTService>(config["stt"]);
    m_sttService->start();

    // Setup callbacks
    m_sttService->startListening(
        [this](const std::string& text, STTMode mode) {
            handleTranscription(text, mode);
        },
        [this](const std::string& keyword) {
            handleKeyword(keyword);
        }
    );
}

Handlers :

void VoiceService::handleKeyword(const std::string& keyword) {
    m_logger->info("Keyword detected: {}", keyword);

    // Publish keyword detection
    nlohmann::json event = {
        {"type", "keyword_detected"},
        {"keyword", keyword},
        {"timestamp", std::time(nullptr)}
    };
    m_io->publish("voice:keyword_detected", event);

    // Auto-switch to active mode
    m_sttService->setMode(STTMode::ACTIVE);
}

void VoiceService::handleTranscription(const std::string& text, STTMode mode) {
    m_logger->info("Transcription ({}): {}",
        mode == STTMode::PASSIVE ? "passive" : "active", text);

    // Publish transcription
    nlohmann::json event = {
        {"type", "transcription"},
        {"text", text},
        {"mode", mode == STTMode::PASSIVE ? "passive" : "active"},
        {"timestamp", std::time(nullptr)}
    };
    m_io->publish("voice:transcription", event);
}

Estimation modifications : +150 lignes

Phase 7.4 - Configuration

Fichier à modifier

`config/voice.json`

Configuration complète :

{
  "tts": {
    "enabled": true,
    "engine": "auto",
    "rate": 0,
    "volume": 80,
    "voice": "fr-fr"
  },
  "stt": {
    "passive_mode": {
      "enabled": true,
      "engine": "pocketsphinx",
      "keywords": ["celuna", "hey celuna", "ok celuna"],
      "threshold": 0.8,
      "model_path": "/usr/share/pocketsphinx/model/en-us"
    },
    "active_mode": {
      "enabled": true,
      "engine": "vosk",
      "model_path": "./models/vosk-model-small-fr-0.22",
      "language": "fr",
      "timeout_seconds": 30,
      "fallback_engine": "whisper-api"
    },
    "whisper_api": {
      "api_key_env": "OPENAI_API_KEY",
      "model": "whisper-1"
    },
    "microphone": {
      "device_id": -1,
      "sample_rate": 16000,
      "channels": 1,
      "buffer_size": 1024
    }
  }
}

Phase 7.5 - Tests

Fichiers à créer

`tests/services/STTServiceTests.cpp`

Tests unitaires :

✅ Création service
✅ Start/stop
✅ Switch passive/active
✅ Keyword detection
✅ Transcription
✅ Fallback engine
✅ Timeout active → passive

Estimation : 200 lignes

`tests/integration/IT_014_VoicePassiveMode.cpp`

Test d'intégration passive mode :

// Simulate audio avec keyword "celuna"
// Vérifie :
//   1. PocketSphinx détecte keyword
//   2. Event "voice:keyword_detected" publié
//   3. Switch vers ACTIVE mode
//   4. Timeout 30s → retour PASSIVE

Estimation : 150 lignes

`tests/integration/IT_015_VoiceActiveTranscription.cpp`

Test d'intégration active mode :

// Simulate conversation complète :
//   1. User: "celuna" → keyword detected
//   2. User: "quelle heure est-il ?" → transcription via Vosk
//   3. AI responds → TTS
//   4. Timeout → retour passive

Estimation : 200 lignes

Phase 7.6 - Documentation

Fichiers à créer/modifier

`docs/STT_ARCHITECTURE.md`

Documentation technique :

Architecture STT
Choix engines
Configuration
Troubleshooting

Estimation : 400 lignes

`README.md`

Mise à jour roadmap :

### Completed ✅
- [x] STT multi-engine (Vosk, PocketSphinx, Whisper)
- [x] Passive/Active mode (keyword "Celuna")
- [x] Local STT (coût zéro)

Récapitulatif Estimation

Tâche	Fichiers	Lignes	Priorité
7.1 Service Layer	`ISTTService.hpp`, `STTService.{h,cpp}`	350	P0
7.2 Vosk Engine	`VoskSTTEngine.hpp`	200	P0
7.2 PocketSphinx	`PocketSphinxEngine.hpp`	180	P1
7.2 WhisperCpp	`WhisperCppEngine.hpp`	250	P2 (optionnel)
7.2 Factory	`STTEngineFactory.cpp`	80	P0
7.3 VoiceService	`VoiceService.cpp` (modifs)	+150	P0
7.4 Config	`voice.json`	+30	P0
7.5 Tests unitaires	`STTServiceTests.cpp`	200	P1
7.5 Tests intégration	`IT_014`, `IT_015`	350	P1
7.6 Documentation	`STT_ARCHITECTURE.md`, README	450	P2
TOTAL	14 fichiers	~2240 lignes

Plan d'Exécution

Milestone 1 : MVP STT Local (Vosk seul) ⚡

Objectif : STT fonctionnel sans keyword detection

Tâches :

✅ Créer ISTTService.hpp
✅ Créer STTService (simple, sans passive mode)
✅ Créer VoskSTTEngine
✅ Modifier STTEngineFactory
✅ Intégrer dans VoiceService
✅ Config voice.json
✅ Test manuel transcription

Durée estimée : 3-4h Lignes : ~600

Milestone 2 : Passive Mode (Keyword Detection) 🎧

Objectif : Détection "Celuna" + switch auto

Tâches :

✅ Créer PocketSphinxEngine
✅ Étendre STTService (dual mode)
✅ Callbacks keyword/transcription
✅ Timeout active → passive
✅ Config passive/active
✅ Tests IT_014, IT_015

Durée estimée : 4-5h Lignes : ~700

Milestone 3 : Fallback Whisper API 🔄

Objectif : Robustesse avec fallback cloud

Tâches :

✅ Intégrer WhisperAPIEngine existant
✅ Logique fallback dans STTService
✅ Config fallback
✅ Tests fallback

Durée estimée : 2h Lignes : ~200

Milestone 4 : Polish & Documentation 📝

Tâches :

✅ Documentation complète
✅ Tests unitaires STTService
✅ Troubleshooting guide
✅ Mise à jour README

Durée estimée : 3h Lignes : ~700

Dépendances Externes

À installer

# Vosk
sudo apt install libvosk-dev
wget https://alphacephei.com/vosk/models/vosk-model-small-fr-0.22.zip
unzip vosk-model-small-fr-0.22.zip -d models/

# PocketSphinx
sudo apt install pocketsphinx pocketsphinx-en-us

# PortAudio (pour microphone)
sudo apt install portaudio19-dev

# Optionnel: whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp external/whisper.cpp
cd external/whisper.cpp && make

CMakeLists.txt

# Find Vosk
find_library(VOSK_LIBRARY vosk REQUIRED)
find_path(VOSK_INCLUDE_DIR vosk_api.h REQUIRED)

# Find PocketSphinx
find_library(POCKETSPHINX_LIBRARY pocketsphinx REQUIRED)
find_path(POCKETSPHINX_INCLUDE_DIR pocketsphinx.h REQUIRED)

# Find PortAudio
find_library(PORTAUDIO_LIBRARY portaudio REQUIRED)
find_path(PORTAUDIO_INCLUDE_DIR portaudio.h REQUIRED)

# Link
target_link_libraries(VoiceService
    ${VOSK_LIBRARY}
    ${POCKETSPHINX_LIBRARY}
    ${PORTAUDIO_LIBRARY}
)

Risques & Mitigation

Risque	Impact	Mitigation
Vosk model trop lourd	RAM (50MB)	Utiliser `vosk-model-small` au lieu de `base`
PocketSphinx faux positifs	UX	Ajuster threshold (1e-40 → 1e-50)
Microphone permissions	Bloquant	Guide installation PortAudio + permissions
Latence transcription	UX	Buffer 1-2s audio avant transcription
Whisper API coût	Budget	Utiliser seulement en fallback (rare)

Prochaines Étapes

Après validation de ce plan :

Installer dépendances (Vosk, PocketSphinx, PortAudio)
Milestone 1 : Vosk STT basique
Tester : Transcription fichier audio FR
Milestone 2 : Keyword "Celuna"
Tester : Conversation complète passive → active
Commit + Push : Phase 7 complète

Validation Plan

Questions avant implémentation :

✅ Architecture service layer approuvée ?
✅ Choix engines (Vosk + PocketSphinx) OK ?
❓ Besoin WhisperCpp ou Vosk suffit ?
✅ Nom "Celuna" confirmé ?
❓ Autres keywords à détecter ("hey celuna", "ok celuna") ?

Auteur : Claude Code Date : 2025-11-29 Phase : 7 - STT Implementation Status : 📋 Plan - En attente validation

26 KiB Raw Permalink Blame History

Phase 7 - Implémentation STT Modulaire

Vue d'Ensemble

Objectifs

Architecture Cible

Phase 7.1 - Service Layer (ISTTService)

Objectif

Fichiers à créer

1. src/services/ISTTService.hpp

2. src/services/STTService.hpp + .cpp

Phase 7.2 - Engines STT

Fichiers à modifier/créer

1. src/shared/audio/ISTTEngine.hpp ✅ Existe

2. src/shared/audio/WhisperAPIEngine.hpp ✅ Existe

3. src/shared/audio/VoskSTTEngine.hpp 🆕 À créer

4. src/shared/audio/PocketSphinxEngine.hpp 🆕 À créer

5. src/shared/audio/WhisperCppEngine.hpp 🆕 À créer (OPTIONNEL)

6. src/shared/audio/STTEngineFactory.cpp 📝 Modifier

Phase 7.3 - Intégration VoiceService

Fichier à modifier

src/services/VoiceService.cpp

Phase 7.4 - Configuration

Fichier à modifier

config/voice.json

Phase 7.5 - Tests

Fichiers à créer

tests/services/STTServiceTests.cpp

tests/integration/IT_014_VoicePassiveMode.cpp

tests/integration/IT_015_VoiceActiveTranscription.cpp

Phase 7.6 - Documentation

Fichiers à créer/modifier

docs/STT_ARCHITECTURE.md

README.md

Récapitulatif Estimation

Plan d'Exécution

Milestone 1 : MVP STT Local (Vosk seul) ⚡

Milestone 2 : Passive Mode (Keyword Detection) 🎧

Milestone 3 : Fallback Whisper API 🔄

Milestone 4 : Polish & Documentation 📝

Dépendances Externes

À installer

CMakeLists.txt

Risques & Mitigation

Prochaines Étapes

Validation Plan

26 KiB

Raw Permalink Blame History

1. `src/services/ISTTService.hpp`

2. `src/services/STTService.hpp` + `.cpp`

1. `src/shared/audio/ISTTEngine.hpp` ✅ Existe

2. `src/shared/audio/WhisperAPIEngine.hpp` ✅ Existe

3. `src/shared/audio/VoskSTTEngine.hpp` 🆕 À créer

4. `src/shared/audio/PocketSphinxEngine.hpp` 🆕 À créer

5. `src/shared/audio/WhisperCppEngine.hpp` 🆕 À créer (OPTIONNEL)

6. `src/shared/audio/STTEngineFactory.cpp` 📝 Modifier

`src/services/VoiceService.cpp`

`config/voice.json`

`tests/services/STTServiceTests.cpp`

`tests/integration/IT_014_VoicePassiveMode.cpp`

`tests/integration/IT_015_VoiceActiveTranscription.cpp`

`docs/STT_ARCHITECTURE.md`

`README.md`