Complete implementation of the real-time Chinese-to-French translation system: Architecture: - 3-threaded pipeline: Audio capture → AI processing → UI rendering - Thread-safe queues for inter-thread communication - Configurable audio chunk sizes for latency tuning Core Features: - Audio capture with PortAudio (configurable sample rate/channels) - Whisper API integration for Chinese speech-to-text - Claude API integration for Chinese-to-French translation - ImGui real-time display with stop button - Full recording saved to WAV on stop Modules Implemented: - audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export) - api/: WhisperClient + ClaudeClient (HTTP API wrappers) - ui/: TranslationUI (ImGui interface) - core/: Pipeline (orchestrates all threads) - utils/: Config (JSON/.env loader) + ThreadSafeQueue (template) Build System: - CMake with vcpkg for dependency management - vcpkg.json manifest for reproducible builds - build.sh helper script Configuration: - config.json: Audio settings, API parameters, UI config - .env: API keys (OpenAI + Anthropic) Documentation: - README.md: Setup instructions, usage, architecture - docs/implementation_plan.md: Technical design document - docs/SecondVoice.md: Project vision and motivation Next Steps: - Test build with vcpkg dependencies - Test audio capture on real hardware - Validate API integrations - Tune chunk size for optimal latency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
495 lines
14 KiB
Markdown
495 lines
14 KiB
Markdown
# SecondVoice - Plan d'Implémentation MVP
|
|
|
|
**Date**: 20 novembre 2025
|
|
**Target**: MVP minimal fonctionnel
|
|
**Platform**: Linux
|
|
**Package Manager**: vcpkg
|
|
|
|
---
|
|
|
|
## 🎯 Objectif MVP Minimal
|
|
|
|
Application desktop qui:
|
|
1. Capture audio microphone en continu
|
|
2. Transcrit chinois → texte (Whisper API)
|
|
3. Traduit texte → français (Claude API)
|
|
4. Affiche traduction temps réel (ImGui)
|
|
5. Bouton Stop pour arrêter (pas de résumé MVP)
|
|
|
|
---
|
|
|
|
## 🏗️ Architecture Technique
|
|
|
|
### Pipeline
|
|
```
|
|
Audio Capture (PortAudio)
|
|
↓ (chunks audio configurables)
|
|
Whisper API (STT)
|
|
↓ (texte chinois)
|
|
Claude API (traduction)
|
|
↓ (texte français)
|
|
ImGui UI (display temps réel + bouton Stop)
|
|
```
|
|
|
|
### Threading Model
|
|
```
|
|
Thread 1 - Audio Capture:
|
|
- PortAudio callback capture audio
|
|
- Accumule chunks (taille configurable)
|
|
- Push dans queue thread-safe
|
|
- Save WAV backup en background
|
|
|
|
Thread 2 - AI Processing:
|
|
- Pop chunk depuis audio queue
|
|
- POST Whisper API → transcription chinoise
|
|
- POST Claude API → traduction française
|
|
- Push résultat dans UI queue
|
|
|
|
Thread 3 - Main UI (ImGui):
|
|
- Render window ImGui
|
|
- Display traductions depuis queue
|
|
- Handle bouton Stop
|
|
- Update status/duration
|
|
```
|
|
|
|
---
|
|
|
|
## 📁 Structure Projet
|
|
|
|
```
|
|
secondvoice/
|
|
├── .env # API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY)
|
|
├── .gitignore
|
|
├── CMakeLists.txt # Build configuration
|
|
├── vcpkg.json # Dependencies manifest
|
|
├── config.json # Runtime config (audio chunk size, etc)
|
|
├── README.md
|
|
├── docs/
|
|
│ ├── SecondVoice.md # Vision document
|
|
│ └── implementation_plan.md # Ce document
|
|
├── src/
|
|
│ ├── main.cpp # Entry point + ImGui main loop
|
|
│ ├── audio/
|
|
│ │ ├── AudioCapture.h
|
|
│ │ ├── AudioCapture.cpp # PortAudio wrapper
|
|
│ │ ├── AudioBuffer.h
|
|
│ │ └── AudioBuffer.cpp # Thread-safe ring buffer
|
|
│ ├── api/
|
|
│ │ ├── WhisperClient.h
|
|
│ │ ├── WhisperClient.cpp # Whisper API client
|
|
│ │ ├── ClaudeClient.h
|
|
│ │ └── ClaudeClient.cpp # Claude API client
|
|
│ ├── ui/
|
|
│ │ ├── TranslationUI.h
|
|
│ │ └── TranslationUI.cpp # ImGui interface
|
|
│ ├── utils/
|
|
│ │ ├── Config.h
|
|
│ │ ├── Config.cpp # Load .env + config.json
|
|
│ │ ├── ThreadSafeQueue.h # Template queue thread-safe
|
|
│ │ └── Logger.h # Simple logging
|
|
│ └── core/
|
|
│ ├── Pipeline.h
|
|
│ └── Pipeline.cpp # Orchestrate threads
|
|
├── recordings/ # Output audio files
|
|
│ └── .gitkeep
|
|
└── build/ # CMake build output (ignored)
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Dépendances
|
|
|
|
### vcpkg.json
|
|
```json
|
|
{
|
|
"name": "secondvoice",
|
|
"version": "0.1.0",
|
|
"dependencies": [
|
|
"portaudio",
|
|
"cpp-httplib",
|
|
"nlohmann-json",
|
|
"imgui[glfw-binding,opengl3-binding]",
|
|
"glfw3",
|
|
"opengl"
|
|
]
|
|
}
|
|
```
|
|
|
|
### System Requirements (Linux)
|
|
```bash
|
|
# PortAudio dependencies
|
|
sudo apt install libasound2-dev
|
|
|
|
# OpenGL dependencies
|
|
sudo apt install libgl1-mesa-dev libglu1-mesa-dev
|
|
```
|
|
|
|
---
|
|
|
|
## ⚙️ Configuration
|
|
|
|
### .env (racine projet)
|
|
```env
|
|
OPENAI_API_KEY=sk-...
|
|
ANTHROPIC_API_KEY=sk-ant-...
|
|
```
|
|
|
|
### config.json (racine projet)
|
|
```json
|
|
{
|
|
"audio": {
|
|
"sample_rate": 16000,
|
|
"channels": 1,
|
|
"chunk_duration_seconds": 10,
|
|
"format": "wav"
|
|
},
|
|
"whisper": {
|
|
"model": "whisper-1",
|
|
"language": "zh",
|
|
"temperature": 0.0
|
|
},
|
|
"claude": {
|
|
"model": "claude-haiku-4-20250514",
|
|
"max_tokens": 1024,
|
|
"temperature": 0.3,
|
|
"system_prompt": "Tu es un traducteur professionnel chinois-français. Traduis le texte suivant de manière naturelle et contextuelle."
|
|
},
|
|
"ui": {
|
|
"window_width": 800,
|
|
"window_height": 600,
|
|
"font_size": 16,
|
|
"max_display_lines": 50
|
|
},
|
|
"recording": {
|
|
"save_audio": true,
|
|
"output_directory": "./recordings"
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🔌 API Clients
|
|
|
|
### Whisper API
|
|
```cpp
|
|
// POST https://api.openai.com/v1/audio/transcriptions
|
|
// Content-Type: multipart/form-data
|
|
|
|
Request:
|
|
- file: audio.wav (binary)
|
|
- model: whisper-1
|
|
- language: zh
|
|
- temperature: 0.0
|
|
|
|
Response:
|
|
{
|
|
"text": "你好,今天我们讨论项目进度..."
|
|
}
|
|
```
|
|
|
|
### Claude API
|
|
```cpp
|
|
// POST https://api.anthropic.com/v1/messages
|
|
// Content-Type: application/json
|
|
// x-api-key: {ANTHROPIC_API_KEY}
|
|
// anthropic-version: 2023-06-01
|
|
|
|
Request:
|
|
{
|
|
"model": "claude-haiku-4-20250514",
|
|
"max_tokens": 1024,
|
|
"messages": [{
|
|
"role": "user",
|
|
"content": "Traduis en français: 你好,今天我们讨论项目进度..."
|
|
}]
|
|
}
|
|
|
|
Response:
|
|
{
|
|
"content": [{
|
|
"type": "text",
|
|
"text": "Bonjour, aujourd'hui nous discutons de l'avancement du projet..."
|
|
}],
|
|
"model": "claude-haiku-4-20250514",
|
|
"usage": {...}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🎨 Interface ImGui
|
|
|
|
### Layout Minimaliste
|
|
```
|
|
┌────────────────────────────────────────────┐
|
|
│ SecondVoice - Live Translation │
|
|
├────────────────────────────────────────────┤
|
|
│ │
|
|
│ [●] Recording... Duration: 00:05:23 │
|
|
│ │
|
|
│ ┌────────────────────────────────────────┐ │
|
|
│ │ 中文: 你好,今天我们讨论项目进度... │ │
|
|
│ │ FR: Bonjour, aujourd'hui nous │ │
|
|
│ │ discutons de l'avancement... │ │
|
|
│ │ │ │
|
|
│ │ 中文: 关于预算的问题... │ │
|
|
│ │ FR: Concernant la question du budget.. │ │
|
|
│ │ │ │
|
|
│ │ [Auto-scroll enabled] │ │
|
|
│ │ │ │
|
|
│ └────────────────────────────────────────┘ │
|
|
│ │
|
|
│ [ STOP RECORDING ] │
|
|
│ │
|
|
│ Status: Processing chunk 12/12 │
|
|
│ Audio: 16kHz mono, chunk size: 10s │
|
|
└────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Features UI
|
|
- **Scrollable text area**: Auto-scroll, peut désactiver pour review
|
|
- **Color coding**: Chinois (couleur 1), Français (couleur 2)
|
|
- **Status bar**: Duration, chunk count, processing status
|
|
- **Stop button**: Arrête capture + processing, sauvegarde audio
|
|
- **Window resizable**: Layout adaptatif
|
|
|
|
---
|
|
|
|
## 🚀 Ordre d'Implémentation
|
|
|
|
### Phase 1 - Setup Infrastructure (Jour 1)
|
|
**Todo**:
|
|
1. ✅ Créer structure projet
|
|
2. ✅ Setup CMakeLists.txt avec vcpkg
|
|
3. ✅ Créer .gitignore (.env, build/, recordings/)
|
|
4. ✅ Créer config.json template
|
|
5. ✅ Setup .env (API keys)
|
|
6. ✅ Test build minimal (hello world)
|
|
|
|
**Validation**: `cmake -B build && cmake --build build` compile sans erreurs
|
|
|
|
---
|
|
|
|
### Phase 2 - Audio Capture (Jour 1-2)
|
|
**Todo**:
|
|
1. Implémenter `AudioCapture.h/cpp`:
|
|
- Init PortAudio
|
|
- Callback capture audio
|
|
- Accumulation chunks (configurable duration)
|
|
- Push dans ThreadSafeQueue
|
|
2. Implémenter `AudioBuffer.h/cpp`:
|
|
- Ring buffer pour audio raw
|
|
- Thread-safe operations
|
|
3. Test standalone: Capture 30s audio → save WAV
|
|
|
|
**Validation**: Audio WAV lisible, durée correcte, qualité OK
|
|
|
|
---
|
|
|
|
### Phase 3 - Whisper Client (Jour 2)
|
|
**Todo**:
|
|
1. Implémenter `WhisperClient.h/cpp`:
|
|
- Load API key depuis .env
|
|
- POST multipart/form-data (cpp-httplib)
|
|
- Encode audio WAV en memory
|
|
- Parse JSON response
|
|
- Error handling (retry, timeout)
|
|
2. Test standalone: Audio file → Whisper → texte chinois
|
|
|
|
**Validation**: Transcription chinoise correcte sur sample audio
|
|
|
|
---
|
|
|
|
### Phase 4 - Claude Client (Jour 2-3)
|
|
**Todo**:
|
|
1. Implémenter `ClaudeClient.h/cpp`:
|
|
- Load API key depuis .env
|
|
- POST JSON request (cpp-httplib)
|
|
- System prompt configurable
|
|
- Parse response (extract text)
|
|
- Error handling
|
|
2. Test standalone: Texte chinois → Claude → texte français
|
|
|
|
**Validation**: Traduction française naturelle et correcte
|
|
|
|
---
|
|
|
|
### Phase 5 - ImGui UI (Jour 3)
|
|
**Todo**:
|
|
1. Setup ImGui + GLFW + OpenGL:
|
|
- Window creation
|
|
- Render loop
|
|
- Input handling
|
|
2. Implémenter `TranslationUI.h/cpp`:
|
|
- Scrollable text area
|
|
- Display messages (CN + FR)
|
|
- Button Stop
|
|
- Status bar (duration, chunk count)
|
|
3. Test standalone: Afficher mock data
|
|
|
|
**Validation**: UI responsive, affichage texte OK, bouton fonctionne
|
|
|
|
---
|
|
|
|
### Phase 6 - Pipeline Integration (Jour 4)
|
|
**Todo**:
|
|
1. Implémenter `Pipeline.h/cpp`:
|
|
- Thread 1: AudioCapture loop
|
|
- Thread 2: Processing loop (Whisper → Claude)
|
|
- Thread 3: UI loop (ImGui)
|
|
- ThreadSafeQueue entre threads
|
|
- Synchronisation (start/stop)
|
|
2. Implémenter `Config.h/cpp`:
|
|
- Load .env (API keys)
|
|
- Load config.json (settings)
|
|
3. Implémenter `main.cpp`:
|
|
- Init all components
|
|
- Start pipeline
|
|
- Handle graceful shutdown
|
|
|
|
**Validation**: Pipeline complet fonctionne bout-à-bout
|
|
|
|
---
|
|
|
|
### Phase 7 - Testing & Tuning (Jour 5)
|
|
**Todo**:
|
|
1. Test avec audio réel chinois:
|
|
- Sample conversations
|
|
- Different audio qualities
|
|
- Different chunk sizes (5s, 10s, 30s)
|
|
2. Measure latence:
|
|
- Audio → Whisper: X secondes
|
|
- Whisper → Claude: Y secondes
|
|
- Total: Z secondes
|
|
3. Debug & fix bugs:
|
|
- Memory leaks
|
|
- Thread safety issues
|
|
- API errors handling
|
|
4. Optimize:
|
|
- Chunk size optimal (tradeoff latency vs accuracy)
|
|
- API timeout values
|
|
- UI refresh rate
|
|
|
|
**Validation**:
|
|
- Latence totale < 10s acceptable
|
|
- Pas de crash sur 30min recording
|
|
- Transcription + traduction compréhensibles
|
|
|
|
---
|
|
|
|
## 🧪 Test Plan
|
|
|
|
### Unit Tests (Phase 2+)
|
|
- `AudioCapture`: Capture audio, format correct
|
|
- `WhisperClient`: API call mock, parsing JSON
|
|
- `ClaudeClient`: API call mock, parsing JSON
|
|
- `ThreadSafeQueue`: Thread safety, no data loss
|
|
|
|
### Integration Tests
|
|
- Audio → Whisper: Audio file → texte chinois correct
|
|
- Whisper → Claude: Texte chinois → traduction française correcte
|
|
- Pipeline: Audio → UI display complet
|
|
|
|
### End-to-End Test
|
|
- Recording 5min conversation chinoise réelle
|
|
- Vérifier transcription accuracy (>85%)
|
|
- Vérifier traduction compréhensible
|
|
- Vérifier UI responsive
|
|
- Vérifier audio sauvegardé correctement
|
|
|
|
---
|
|
|
|
## 📊 Metrics à Tracker
|
|
|
|
### Performance
|
|
- **Latence Whisper**: Temps API call (target: <3s pour 10s audio)
|
|
- **Latence Claude**: Temps API call (target: <2s pour 200 tokens)
|
|
- **Latence totale**: Audio → Display (target: <10s)
|
|
- **Memory usage**: Stable sur longue durée (no leaks)
|
|
- **CPU usage**: Acceptable (<50% sur laptop)
|
|
|
|
### Qualité
|
|
- **Whisper accuracy**: % mots corrects (target: >85%)
|
|
- **Claude quality**: Traduction naturelle (subjective)
|
|
- **Crash rate**: 0 crash sur 1h recording
|
|
|
|
### Cost
|
|
- **Whisper**: $0.006/min audio
|
|
- **Claude**: ~$0.03-0.05/h (depends on text volume)
|
|
- **Total**: ~$0.40/h meeting
|
|
|
|
---
|
|
|
|
## ⚠️ Risks & Mitigations
|
|
|
|
| Risk | Impact | Mitigation |
|
|
|------|--------|------------|
|
|
| **Whisper API timeout** | Bloquant | Retry logic, timeout 30s, fallback queue |
|
|
| **Claude API rate limit** | Moyen | Exponential backoff, queue requests |
|
|
| **Audio buffer overflow** | Moyen | Ring buffer size adequate, drop old chunks if needed |
|
|
| **Thread deadlock** | Bloquant | Use std::lock_guard, avoid nested locks |
|
|
| **Memory leak** | Moyen | Use smart pointers, valgrind tests |
|
|
| **Network interruption** | Moyen | Retry logic, cache audio locally |
|
|
|
|
---
|
|
|
|
## 🎯 Success Criteria MVP
|
|
|
|
✅ **MVP validé si**:
|
|
1. Capture audio microphone fonctionne
|
|
2. Transcription chinoise >85% précise
|
|
3. Traduction française compréhensible
|
|
4. UI affiche traductions temps réel
|
|
5. Bouton Stop arrête proprement
|
|
6. Audio sauvegardé correctement
|
|
7. Pas de crash sur 30min recording
|
|
8. Latence totale <10s acceptable
|
|
|
|
---
|
|
|
|
## 📝 Notes Implémentation
|
|
|
|
### Thread Safety
|
|
- Utiliser `std::mutex` + `std::lock_guard` pour queues
|
|
- Pas de shared state sans protection
|
|
- Use `std::atomic<bool>` pour flags (running, stopping)
|
|
|
|
### Error Handling
|
|
- Try/catch sur API calls
|
|
- Log errors (spdlog ou simple cout)
|
|
- Retry logic (max 3 attempts)
|
|
- Graceful degradation (skip chunk si error persistant)
|
|
|
|
### Audio Format
|
|
- **Sample rate**: 16kHz (optimal pour Whisper)
|
|
- **Channels**: Mono (sufficient, réduit bandwidth)
|
|
- **Format**: 16-bit PCM WAV
|
|
- **Chunk size**: Configurable (default 10s)
|
|
|
|
### API Best Practices
|
|
- **Timeout**: 30s pour Whisper, 15s pour Claude
|
|
- **Retry**: Exponential backoff (1s, 2s, 4s)
|
|
- **Rate limiting**: Respect API limits (monitor 429 errors)
|
|
- **Headers**: Always set User-Agent, API version
|
|
|
|
---
|
|
|
|
## 🔄 Post-MVP (Phase 2)
|
|
|
|
**Not included in MVP, but planned**:
|
|
- ❌ Résumé auto post-meeting (Claude summary)
|
|
- ❌ Export structuré (transcripts + audio)
|
|
- ❌ Système de recherche (backlog)
|
|
- ❌ Diarization (qui parle)
|
|
- ❌ Replay mode
|
|
- ❌ GUI élaborée (settings, etc)
|
|
|
|
**Focus MVP**: Pipeline fonctionnel bout-à-bout, validation concept, usage réel premier meeting.
|
|
|
|
---
|
|
|
|
*Document créé: 20 novembre 2025*
|
|
*Status: Ready to implement*
|
|
*Estimated effort: 5 jours développement + 2 jours tests*
|