Implemented complete STT (Speech-to-Text) system with 4 engines:
1. **PocketSphinxEngine** (new)
- Lightweight keyword spotting
- Perfect for passive wake word detection
- ~10MB model, very low CPU/RAM usage
- Keywords: "celuna", "hey celuna", etc.
2. **VoskSTTEngine** (existing)
- Balanced local STT for full transcription
- 50MB models, good accuracy
- Already working
3. **WhisperCppEngine** (new)
- High-quality offline STT using whisper.cpp
- 75MB-2.9GB models depending on quality
- Excellent accuracy, runs entirely local
4. **WhisperAPIEngine** (existing)
- Cloud STT via OpenAI Whisper API
- Best accuracy, requires internet + API key
- Already working
Features:
- Full JSON configuration via config/voice.json
- Auto-selection mode tries engines in order
- Dual mode support (passive + active)
- Fallback chain for reliability
- All engines use ISTTEngine interface
Updated:
- STTEngineFactory: Added support for all 4 engines
- CMakeLists.txt: Added new source files
- docs/STT_CONFIGURATION.md: Complete config guide
Config example (voice.json):
{
"passive_mode": { "engine": "pocketsphinx" },
"active_mode": { "engine": "vosk", "fallback": "whisper-api" }
}
Architecture: ISTTService → STTEngineFactory → 4 engines
Build: ✅ Compiles successfully
Status: Phase 7 complete, ready for testing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
269 lines
6.6 KiB
Markdown
269 lines
6.6 KiB
Markdown
# Configuration STT - Speech-to-Text
|
|
|
|
AISSIA supporte **4 engines STT** différents, configurables via `config/voice.json`.
|
|
|
|
## Engines Disponibles
|
|
|
|
### 1. **PocketSphinx** - Keyword Spotting Léger
|
|
- **Usage** : Détection de mots-clés (mode passif)
|
|
- **Taille** : ~10 MB
|
|
- **Performance** : Très économe (CPU/RAM)
|
|
- **Précision** : Moyenne (bon pour wake words)
|
|
- **Installation** : `sudo apt install pocketsphinx libpocketsphinx-dev`
|
|
- **Modèle** : `/usr/share/pocketsphinx/model/en-us`
|
|
|
|
**Config** :
|
|
```json
|
|
{
|
|
"stt": {
|
|
"passive_mode": {
|
|
"enabled": true,
|
|
"engine": "pocketsphinx",
|
|
"keywords": ["celuna", "hey celuna", "ok celuna"],
|
|
"threshold": 0.8,
|
|
"model_path": "/usr/share/pocketsphinx/model/en-us"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. **Vosk** - STT Local Équilibré
|
|
- **Usage** : Transcription complète locale
|
|
- **Taille** : 50 MB (small), 1.8 GB (large)
|
|
- **Performance** : Rapide, usage modéré
|
|
- **Précision** : Bonne
|
|
- **Installation** : Télécharger modèle depuis [alphacephei.com/vosk/models](https://alphacephei.com/vosk/models)
|
|
- **Modèle** : `./models/vosk-model-small-fr-0.22`
|
|
|
|
**Config** :
|
|
```json
|
|
{
|
|
"stt": {
|
|
"active_mode": {
|
|
"enabled": true,
|
|
"engine": "vosk",
|
|
"model_path": "./models/vosk-model-small-fr-0.22",
|
|
"language": "fr"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. **Whisper.cpp** - STT Local Haute Qualité
|
|
- **Usage** : Transcription de haute qualité offline
|
|
- **Taille** : 75 MB (tiny) à 2.9 GB (large)
|
|
- **Performance** : Plus lourd, très précis
|
|
- **Précision** : Excellente
|
|
- **Installation** : Compiler whisper.cpp et télécharger modèles GGML
|
|
- **Modèle** : `./models/ggml-base.bin`
|
|
|
|
**Config** :
|
|
```json
|
|
{
|
|
"stt": {
|
|
"active_mode": {
|
|
"enabled": true,
|
|
"engine": "whisper-cpp",
|
|
"model_path": "./models/ggml-base.bin",
|
|
"language": "fr"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. **Whisper API** - STT Cloud OpenAI
|
|
- **Usage** : Transcription via API OpenAI
|
|
- **Taille** : N/A (cloud)
|
|
- **Performance** : Dépend de latence réseau
|
|
- **Précision** : Excellente
|
|
- **Installation** : Aucune (API key requise)
|
|
- **Coût** : $0.006 / minute
|
|
|
|
**Config** :
|
|
```json
|
|
{
|
|
"stt": {
|
|
"active_mode": {
|
|
"enabled": true,
|
|
"engine": "whisper-api",
|
|
"fallback_engine": "whisper-api"
|
|
},
|
|
"whisper_api": {
|
|
"api_key_env": "OPENAI_API_KEY",
|
|
"model": "whisper-1"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Configuration Complète
|
|
|
|
### Dual Mode (Passive + Active)
|
|
|
|
```json
|
|
{
|
|
"tts": {
|
|
"enabled": true,
|
|
"engine": "auto",
|
|
"rate": 0,
|
|
"volume": 80,
|
|
"voice": "fr-fr"
|
|
},
|
|
"stt": {
|
|
"passive_mode": {
|
|
"enabled": true,
|
|
"engine": "pocketsphinx",
|
|
"keywords": ["celuna", "hey celuna", "ok celuna"],
|
|
"threshold": 0.8,
|
|
"model_path": "/usr/share/pocketsphinx/model/en-us"
|
|
},
|
|
"active_mode": {
|
|
"enabled": true,
|
|
"engine": "vosk",
|
|
"model_path": "./models/vosk-model-small-fr-0.22",
|
|
"language": "fr",
|
|
"timeout_seconds": 30,
|
|
"fallback_engine": "whisper-api"
|
|
},
|
|
"whisper_api": {
|
|
"api_key_env": "OPENAI_API_KEY",
|
|
"model": "whisper-1"
|
|
},
|
|
"microphone": {
|
|
"device_id": -1,
|
|
"sample_rate": 16000,
|
|
"channels": 1,
|
|
"buffer_size": 1024
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Mode Auto
|
|
|
|
Utilise `"engine": "auto"` pour sélection automatique :
|
|
|
|
1. Essaie **Vosk** si modèle disponible
|
|
2. Essaie **Whisper.cpp** si modèle disponible
|
|
3. Fallback sur **Whisper API** si clé API présente
|
|
4. Sinon utilise **Stub** (mode désactivé)
|
|
|
|
```json
|
|
{
|
|
"stt": {
|
|
"active_mode": {
|
|
"engine": "auto",
|
|
"model_path": "./models/vosk-model-small-fr-0.22",
|
|
"language": "fr"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Comparaison des Engines
|
|
|
|
| Engine | Taille | CPU | RAM | Latence | Précision | Usage Recommandé |
|
|
|--------|--------|-----|-----|---------|-----------|------------------|
|
|
| **PocketSphinx** | 10 MB | Faible | Faible | Très rapide | Moyenne | Wake words, keywords |
|
|
| **Vosk** | 50 MB+ | Moyen | Moyen | Rapide | Bonne | Transcription générale |
|
|
| **Whisper.cpp** | 75 MB+ | Élevé | Élevé | Moyen | Excellente | Haute qualité offline |
|
|
| **Whisper API** | 0 MB | Nul | Nul | Variable | Excellente | Simplicité, cloud |
|
|
|
|
## Workflow Recommandé
|
|
|
|
### Scénario 1 : Assistant Vocal Local
|
|
```
|
|
Mode Passif (PocketSphinx) → Détecte "hey celuna"
|
|
↓
|
|
Mode Actif (Vosk) → Transcrit la commande
|
|
↓
|
|
Traite la commande
|
|
```
|
|
|
|
### Scénario 2 : Haute Qualité avec Fallback
|
|
```
|
|
Essaie Vosk (local, rapide)
|
|
↓ (si échec)
|
|
Essaie Whisper.cpp (local, précis)
|
|
↓ (si échec)
|
|
Fallback Whisper API (cloud)
|
|
```
|
|
|
|
### Scénario 3 : Cloud-First
|
|
```
|
|
Whisper API directement (simplicité, pas de setup local)
|
|
```
|
|
|
|
## Installation des Dépendances
|
|
|
|
### Ubuntu/Debian
|
|
|
|
```bash
|
|
# PocketSphinx
|
|
sudo apt install pocketsphinx libpocketsphinx-dev
|
|
|
|
# Vosk
|
|
# Télécharger depuis https://alphacephei.com/vosk/models
|
|
mkdir -p models
|
|
cd models
|
|
wget https://alphacephei.com/vosk/models/vosk-model-small-fr-0.22.zip
|
|
unzip vosk-model-small-fr-0.22.zip
|
|
|
|
# Whisper.cpp
|
|
git clone https://github.com/ggerganov/whisper.cpp
|
|
cd whisper.cpp
|
|
make
|
|
# Télécharger modèles GGML
|
|
bash ./models/download-ggml-model.sh base
|
|
```
|
|
|
|
## Variables d'Environnement
|
|
|
|
Configurez dans `.env` :
|
|
|
|
```bash
|
|
# Whisper API (OpenAI)
|
|
OPENAI_API_KEY=sk-...
|
|
|
|
# Optionnel : Chemins personnalisés
|
|
STT_MODEL_PATH=/path/to/models
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### PocketSphinx ne fonctionne pas
|
|
```bash
|
|
# Vérifier installation
|
|
dpkg -l | grep pocketsphinx
|
|
|
|
# Vérifier modèle
|
|
ls /usr/share/pocketsphinx/model/en-us
|
|
```
|
|
|
|
### Vosk ne détecte rien
|
|
```bash
|
|
# Vérifier que libvosk.so est installée
|
|
ldconfig -p | grep vosk
|
|
|
|
# Télécharger le bon modèle pour votre langue
|
|
```
|
|
|
|
### Whisper.cpp erreur
|
|
```bash
|
|
# Recompiler avec support GGML
|
|
cd whisper.cpp && make clean && make
|
|
|
|
# Vérifier format du modèle (doit être .bin)
|
|
file models/ggml-base.bin
|
|
```
|
|
|
|
### Whisper API timeout
|
|
```bash
|
|
# Vérifier clé API
|
|
echo $OPENAI_API_KEY
|
|
|
|
# Tester l'API manuellement
|
|
curl https://api.openai.com/v1/models \
|
|
-H "Authorization: Bearer $OPENAI_API_KEY"
|
|
```
|