docs: Add comprehensive next steps guide
This commit is contained in:
parent
5b60acaa73
commit
67b1587047
399
docs/next_steps.md
Normal file
399
docs/next_steps.md
Normal file
@ -0,0 +1,399 @@
|
||||
# SecondVoice - Prochaines Étapes
|
||||
|
||||
**Date**: 20 novembre 2025
|
||||
**Status**: Setup complet, ready to build
|
||||
|
||||
---
|
||||
|
||||
## ✅ Ce qui est fait
|
||||
|
||||
### Infrastructure
|
||||
- ✅ Structure projet complète (src/, docs/, recordings/)
|
||||
- ✅ CMakeLists.txt configuré avec vcpkg
|
||||
- ✅ vcpkg.json avec toutes les dépendances
|
||||
- ✅ .gitignore configuré
|
||||
- ✅ config.json template
|
||||
- ✅ .env.example
|
||||
- ✅ build.sh helper script
|
||||
- ✅ README.md complet
|
||||
|
||||
### Code Implémenté
|
||||
- ✅ **audio/AudioCapture**: PortAudio wrapper pour capture audio
|
||||
- ✅ **audio/AudioBuffer**: Buffer avec export WAV
|
||||
- ✅ **api/WhisperClient**: Client HTTP pour Whisper API (multipart/form-data)
|
||||
- ✅ **api/ClaudeClient**: Client HTTP pour Claude API (JSON)
|
||||
- ✅ **ui/TranslationUI**: Interface ImGui avec bouton Stop
|
||||
- ✅ **core/Pipeline**: Orchestration 3 threads
|
||||
- ✅ **utils/Config**: Loader JSON + .env
|
||||
- ✅ **utils/ThreadSafeQueue**: Template thread-safe
|
||||
- ✅ **main.cpp**: Entry point
|
||||
|
||||
### Documentation
|
||||
- ✅ docs/SecondVoice.md (vision projet)
|
||||
- ✅ docs/implementation_plan.md (design technique)
|
||||
- ✅ README.md (setup + usage)
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Prochaines Actions Immédiates
|
||||
|
||||
### 1. Setup Environnement (Toi)
|
||||
|
||||
```bash
|
||||
# 1. Installer vcpkg si pas déjà fait
|
||||
git clone https://github.com/microsoft/vcpkg.git ~/vcpkg
|
||||
cd ~/vcpkg
|
||||
./bootstrap-vcpkg.sh
|
||||
export VCPKG_ROOT=~/vcpkg
|
||||
# Ajouter dans ~/.bashrc:
|
||||
echo 'export VCPKG_ROOT=~/vcpkg' >> ~/.bashrc
|
||||
|
||||
# 2. Installer dépendances système
|
||||
sudo apt update
|
||||
sudo apt install -y libasound2-dev libgl1-mesa-dev libglu1-mesa-dev
|
||||
|
||||
# 3. Créer .env
|
||||
cd /mnt/e/Users/Alexis\ Trouvé/Documents/Projets/secondvoice
|
||||
cp .env.example .env
|
||||
# Éditer .env et ajouter tes vraies API keys:
|
||||
# OPENAI_API_KEY=sk-...
|
||||
# ANTHROPIC_API_KEY=sk-ant-...
|
||||
nano .env
|
||||
```
|
||||
|
||||
### 2. Premier Build
|
||||
|
||||
```bash
|
||||
# Lancer le build
|
||||
./build.sh
|
||||
|
||||
# Ou manuellement:
|
||||
cmake -B build -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
|
||||
cmake --build build -j$(nproc)
|
||||
```
|
||||
|
||||
**Note**: Le premier build va prendre du temps (vcpkg va compiler toutes les dépendances).
|
||||
|
||||
### 3. Test Audio Simple
|
||||
|
||||
Avant de tester le pipeline complet, on va faire un test standalone de l'audio capture.
|
||||
|
||||
**Créer un test simple** (`test_audio.cpp`):
|
||||
|
||||
```cpp
|
||||
#include "src/audio/AudioCapture.h"
|
||||
#include "src/audio/AudioBuffer.h"
|
||||
#include <iostream>
|
||||
#include <thread>
|
||||
#include <chrono>
|
||||
|
||||
int main() {
|
||||
secondvoice::AudioCapture capture(16000, 1, 5);
|
||||
secondvoice::AudioBuffer buffer(16000, 1);
|
||||
|
||||
if (!capture.initialize()) {
|
||||
std::cerr << "Failed to init audio" << std::endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
std::cout << "Recording 10 seconds..." << std::endl;
|
||||
|
||||
capture.start([&buffer](const std::vector<float>& data) {
|
||||
buffer.addSamples(data);
|
||||
std::cout << "Captured chunk: " << data.size() << " samples" << std::endl;
|
||||
});
|
||||
|
||||
std::this_thread::sleep_for(std::chrono::seconds(10));
|
||||
|
||||
capture.stop();
|
||||
|
||||
if (buffer.saveToWav("test_recording.wav")) {
|
||||
std::cout << "Saved to test_recording.wav" << std::endl;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
**Build + test**:
|
||||
```bash
|
||||
# Ajouter test_audio dans CMakeLists.txt (temporaire)
|
||||
# Ou compiler manuellement
|
||||
```
|
||||
|
||||
### 4. Test Whisper API
|
||||
|
||||
**Créer un test** (`test_whisper.cpp`):
|
||||
|
||||
```cpp
|
||||
#include "src/api/WhisperClient.h"
|
||||
#include "src/audio/AudioBuffer.h"
|
||||
#include <iostream>
|
||||
|
||||
int main() {
|
||||
// Load API key from .env
|
||||
std::string api_key = "sk-..."; // Replace with real key
|
||||
|
||||
secondvoice::WhisperClient client(api_key);
|
||||
|
||||
// Load test audio (Chinese speech)
|
||||
secondvoice::AudioBuffer buffer(16000, 1);
|
||||
// TODO: Load from existing WAV file
|
||||
|
||||
auto result = client.transcribe(buffer.getSamples(), 16000, 1, "zh", 0.0f);
|
||||
|
||||
if (result.has_value()) {
|
||||
std::cout << "Transcription: " << result->text << std::endl;
|
||||
} else {
|
||||
std::cerr << "Failed" << std::endl;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
**Test avec audio chinois sample**:
|
||||
- Chercher sample audio chinois gratuit (YouTube ou sample sites)
|
||||
- Convertir en WAV 16kHz mono si nécessaire
|
||||
- Tester transcription
|
||||
|
||||
### 5. Test Claude API
|
||||
|
||||
**Créer un test** (`test_claude.cpp`):
|
||||
|
||||
```cpp
|
||||
#include "src/api/ClaudeClient.h"
|
||||
#include <iostream>
|
||||
|
||||
int main() {
|
||||
std::string api_key = "sk-ant-..."; // Replace
|
||||
|
||||
secondvoice::ClaudeClient client(api_key);
|
||||
|
||||
std::string chinese = "你好,今天我们讨论项目进度。";
|
||||
|
||||
auto result = client.translate(chinese);
|
||||
|
||||
if (result.has_value()) {
|
||||
std::cout << "Traduction: " << result->text << std::endl;
|
||||
} else {
|
||||
std::cerr << "Failed" << std::endl;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Test Pipeline Complet
|
||||
|
||||
```bash
|
||||
cd build
|
||||
./SecondVoice
|
||||
```
|
||||
|
||||
**Ce qui doit se passer**:
|
||||
1. Fenêtre ImGui s'ouvre
|
||||
2. "Recording..." affiché
|
||||
3. Audio capturé en temps réel
|
||||
4. Transcriptions chinoises affichées
|
||||
5. Traductions françaises affichées
|
||||
6. Clic "STOP RECORDING" → sauvegarde audio
|
||||
|
||||
### 7. Debug & Fix
|
||||
|
||||
**Problèmes attendus** :
|
||||
|
||||
1. **Build errors** (dépendances vcpkg)
|
||||
- Vérifier vcpkg installé
|
||||
- Vérifier VCPKG_ROOT
|
||||
- Réinstaller package si nécessaire: `vcpkg install portaudio`
|
||||
|
||||
2. **Audio capture fails**
|
||||
- Vérifier microphone disponible: `arecord -l`
|
||||
- Tester avec `arecord -d 5 test.wav`
|
||||
- Vérifier permissions audio
|
||||
|
||||
3. **Whisper API errors**
|
||||
- Vérifier clé API
|
||||
- Tester avec curl:
|
||||
```bash
|
||||
curl https://api.openai.com/v1/audio/transcriptions \
|
||||
-H "Authorization: Bearer $OPENAI_API_KEY" \
|
||||
-F file=@test.wav \
|
||||
-F model=whisper-1
|
||||
```
|
||||
|
||||
4. **Claude API errors**
|
||||
- Vérifier clé API
|
||||
- Tester avec curl:
|
||||
```bash
|
||||
curl https://api.anthropic.com/v1/messages \
|
||||
-H "x-api-key: $ANTHROPIC_API_KEY" \
|
||||
-H "anthropic-version: 2023-06-01" \
|
||||
-H "content-type: application/json" \
|
||||
-d '{
|
||||
"model": "claude-haiku-4-20250514",
|
||||
"max_tokens": 1024,
|
||||
"messages": [{"role": "user", "content": "Traduis: 你好"}]
|
||||
}'
|
||||
```
|
||||
|
||||
5. **UI doesn't appear**
|
||||
- Vérifier OpenGL installé: `glxinfo | grep OpenGL`
|
||||
- Tester avec autre ImGui example
|
||||
- Vérifier DISPLAY variable si SSH
|
||||
|
||||
6. **Latence trop élevée**
|
||||
- Réduire chunk_duration_seconds dans config.json (10s → 5s)
|
||||
- Tester avec différentes valeurs
|
||||
- Mesurer latence Whisper + Claude séparément
|
||||
|
||||
7. **Memory leaks**
|
||||
- Utiliser valgrind: `valgrind --leak-check=full ./SecondVoice`
|
||||
- Vérifier smart pointers utilisés partout
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Critères de Succès MVP
|
||||
|
||||
Pour valider le MVP, il faut que:
|
||||
|
||||
1. ✅ **Build compile** sans erreurs
|
||||
2. ✅ **Audio capture fonctionne** (test avec `test_audio`)
|
||||
3. ✅ **Whisper transcrit correctement** (>85% précision sur sample chinois)
|
||||
4. ✅ **Claude traduit correctement** (traduction naturelle FR)
|
||||
5. ✅ **UI s'affiche** et reste responsive
|
||||
6. ✅ **Pipeline complet fonctionne** bout-à-bout
|
||||
7. ✅ **Bouton Stop arrête proprement** et sauvegarde audio
|
||||
8. ✅ **Latence acceptable** (<10s entre audio et traduction affichée)
|
||||
9. ✅ **Pas de crash** sur 30min recording
|
||||
10. ✅ **Audio sauvegardé** est lisible (test avec `aplay` ou VLC)
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Améliorations Possibles (Post-MVP)
|
||||
|
||||
### Performance
|
||||
- [ ] Optimiser chunk size (benchmark 5s vs 10s vs 30s)
|
||||
- [ ] Pool de threads pour processing parallèle
|
||||
- [ ] Cache API responses (éviter double-call)
|
||||
- [ ] Streaming Whisper (si API supporte)
|
||||
|
||||
### Qualité
|
||||
- [ ] Retry logic sur API failures (exponential backoff)
|
||||
- [ ] Detect silence (skip chunks vides)
|
||||
- [ ] Audio normalization avant Whisper
|
||||
- [ ] Confidence score Whisper (filter low confidence)
|
||||
|
||||
### UI/UX
|
||||
- [ ] Settings panel (change chunk size runtime)
|
||||
- [ ] Pause/Resume button
|
||||
- [ ] Volume meter (visualize audio input)
|
||||
- [ ] Status indicators (API call in progress)
|
||||
- [ ] Copy translation button
|
||||
- [ ] Export transcript button
|
||||
|
||||
### Features
|
||||
- [ ] Auto-summary post-meeting (Claude analyse transcript)
|
||||
- [ ] Export structured (audio + CN + FR + summary)
|
||||
- [ ] Système de recherche (backlog meetings)
|
||||
- [ ] Speaker diarization (qui parle)
|
||||
- [ ] Multi-language support (JP, KR, etc)
|
||||
|
||||
---
|
||||
|
||||
## 📊 Métriques à Mesurer
|
||||
|
||||
### Pendant le développement
|
||||
- **Build time**: vcpkg + compile (first build vs incremental)
|
||||
- **Audio latency**: Capture → buffer ready
|
||||
- **Whisper latency**: Audio → transcription (varie selon chunk size)
|
||||
- **Claude latency**: Texte CN → texte FR
|
||||
- **Total latency**: Audio → Display
|
||||
- **Memory usage**: RSS au démarrage vs après 30min
|
||||
- **CPU usage**: % pendant recording + processing
|
||||
|
||||
### Pendant l'usage réel
|
||||
- **Transcription accuracy**: % mots corrects (subjectif, compare avec native speaker)
|
||||
- **Translation quality**: Compréhensible ? Naturelle ?
|
||||
- **Crash rate**: 0 sur 1h recording
|
||||
- **API cost**: $ par meeting (tracker usage réel)
|
||||
|
||||
---
|
||||
|
||||
## 📝 Notes de Développement
|
||||
|
||||
### Bugs Connus / TODO
|
||||
- [ ] Pipeline.cpp ligne 150: `std::put_time` requiert `<iomanip>` (ajouter include)
|
||||
- [ ] ThreadSafeQueue: Pas de clear() method (ajouter si besoin)
|
||||
- [ ] Config: Pas de validation (e.g., sample_rate > 0)
|
||||
- [ ] Error messages: Pas assez descriptifs (améliorer logging)
|
||||
- [ ] main.cpp: Pas de signal handler (Ctrl+C ne sauvegarde pas audio)
|
||||
|
||||
### Optimisations Possibles
|
||||
- AudioBuffer: std::vector<float> → mmap si très gros
|
||||
- API clients: Connection pooling (httplib keep-alive)
|
||||
- UI: Limit messages affichés (actuellement unbounded)
|
||||
|
||||
### Code Quality
|
||||
- Ajouter unit tests (Google Test)
|
||||
- Ajouter CI/CD (GitHub Actions)
|
||||
- Ajouter pre-commit hooks (clang-format)
|
||||
- Documenter API avec Doxygen
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Ressources Utiles
|
||||
|
||||
### Documentation APIs
|
||||
- [Whisper API](https://platform.openai.com/docs/api-reference/audio)
|
||||
- [Claude API](https://docs.anthropic.com/claude/reference/messages)
|
||||
- [PortAudio](http://portaudio.com/docs/v19-doxydocs/)
|
||||
- [ImGui](https://github.com/ocornut/imgui)
|
||||
|
||||
### Samples Audio Chinois
|
||||
- [YouTube](https://www.youtube.com/results?search_query=chinese+conversation) (utiliser youtube-dl)
|
||||
- [Tatoeba](https://tatoeba.org/en/audio/index/cmn) (phrases chinoises)
|
||||
- [Common Voice](https://commonvoice.mozilla.org/zh-CN) (dataset open)
|
||||
|
||||
### Debug Tools
|
||||
- `arecord -l`: Liste microphones disponibles
|
||||
- `aplay test.wav`: Play WAV file
|
||||
- `valgrind --leak-check=full ./SecondVoice`: Check memory leaks
|
||||
- `gdb ./SecondVoice`: Debug crashes
|
||||
- `strace ./SecondVoice`: Trace syscalls
|
||||
|
||||
---
|
||||
|
||||
## 🚦 Status Tracking
|
||||
|
||||
### Phase 1 - Setup (DONE ✅)
|
||||
- ✅ Structure projet
|
||||
- ✅ CMake + vcpkg
|
||||
- ✅ Code complet implémenté
|
||||
- ✅ Documentation
|
||||
|
||||
### Phase 2 - Build & Test (CURRENT)
|
||||
- ⬜ Setup vcpkg + dépendances
|
||||
- ⬜ Premier build réussi
|
||||
- ⬜ Test audio capture
|
||||
- ⬜ Test Whisper API
|
||||
- ⬜ Test Claude API
|
||||
- ⬜ Test pipeline complet
|
||||
|
||||
### Phase 3 - Debug & Tuning
|
||||
- ⬜ Fix build errors
|
||||
- ⬜ Fix runtime errors
|
||||
- ⬜ Optimiser latence
|
||||
- ⬜ Valider qualité transcription/traduction
|
||||
|
||||
### Phase 4 - Real-World Test
|
||||
- ⬜ Test avec sample audio chinois
|
||||
- ⬜ Test recording 30min sans crash
|
||||
- ⬜ Test en condition réelle (meeting avec Tingting)
|
||||
|
||||
---
|
||||
|
||||
**Next immediate action**: Setup vcpkg et lancer `./build.sh`
|
||||
|
||||
Bonne chance ! 🚀
|
||||
Loading…
Reference in New Issue
Block a user