Complete implementation of the real-time Chinese-to-French translation system: Architecture: - 3-threaded pipeline: Audio capture → AI processing → UI rendering - Thread-safe queues for inter-thread communication - Configurable audio chunk sizes for latency tuning Core Features: - Audio capture with PortAudio (configurable sample rate/channels) - Whisper API integration for Chinese speech-to-text - Claude API integration for Chinese-to-French translation - ImGui real-time display with stop button - Full recording saved to WAV on stop Modules Implemented: - audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export) - api/: WhisperClient + ClaudeClient (HTTP API wrappers) - ui/: TranslationUI (ImGui interface) - core/: Pipeline (orchestrates all threads) - utils/: Config (JSON/.env loader) + ThreadSafeQueue (template) Build System: - CMake with vcpkg for dependency management - vcpkg.json manifest for reproducible builds - build.sh helper script Configuration: - config.json: Audio settings, API parameters, UI config - .env: API keys (OpenAI + Anthropic) Documentation: - README.md: Setup instructions, usage, architecture - docs/implementation_plan.md: Technical design document - docs/SecondVoice.md: Project vision and motivation Next Steps: - Test build with vcpkg dependencies - Test audio capture on real hardware - Validate API integrations - Tune chunk size for optimal latency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
SecondVoice - Plan d'Implémentation MVP
Date: 20 novembre 2025 Target: MVP minimal fonctionnel Platform: Linux Package Manager: vcpkg
🎯 Objectif MVP Minimal
Application desktop qui:
- Capture audio microphone en continu
- Transcrit chinois → texte (Whisper API)
- Traduit texte → français (Claude API)
- Affiche traduction temps réel (ImGui)
- Bouton Stop pour arrêter (pas de résumé MVP)
🏗️ Architecture Technique
Pipeline
Audio Capture (PortAudio)
↓ (chunks audio configurables)
Whisper API (STT)
↓ (texte chinois)
Claude API (traduction)
↓ (texte français)
ImGui UI (display temps réel + bouton Stop)
Threading Model
Thread 1 - Audio Capture:
- PortAudio callback capture audio
- Accumule chunks (taille configurable)
- Push dans queue thread-safe
- Save WAV backup en background
Thread 2 - AI Processing:
- Pop chunk depuis audio queue
- POST Whisper API → transcription chinoise
- POST Claude API → traduction française
- Push résultat dans UI queue
Thread 3 - Main UI (ImGui):
- Render window ImGui
- Display traductions depuis queue
- Handle bouton Stop
- Update status/duration
📁 Structure Projet
secondvoice/
├── .env # API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY)
├── .gitignore
├── CMakeLists.txt # Build configuration
├── vcpkg.json # Dependencies manifest
├── config.json # Runtime config (audio chunk size, etc)
├── README.md
├── docs/
│ ├── SecondVoice.md # Vision document
│ └── implementation_plan.md # Ce document
├── src/
│ ├── main.cpp # Entry point + ImGui main loop
│ ├── audio/
│ │ ├── AudioCapture.h
│ │ ├── AudioCapture.cpp # PortAudio wrapper
│ │ ├── AudioBuffer.h
│ │ └── AudioBuffer.cpp # Thread-safe ring buffer
│ ├── api/
│ │ ├── WhisperClient.h
│ │ ├── WhisperClient.cpp # Whisper API client
│ │ ├── ClaudeClient.h
│ │ └── ClaudeClient.cpp # Claude API client
│ ├── ui/
│ │ ├── TranslationUI.h
│ │ └── TranslationUI.cpp # ImGui interface
│ ├── utils/
│ │ ├── Config.h
│ │ ├── Config.cpp # Load .env + config.json
│ │ ├── ThreadSafeQueue.h # Template queue thread-safe
│ │ └── Logger.h # Simple logging
│ └── core/
│ ├── Pipeline.h
│ └── Pipeline.cpp # Orchestrate threads
├── recordings/ # Output audio files
│ └── .gitkeep
└── build/ # CMake build output (ignored)
🔧 Dépendances
vcpkg.json
{
"name": "secondvoice",
"version": "0.1.0",
"dependencies": [
"portaudio",
"cpp-httplib",
"nlohmann-json",
"imgui[glfw-binding,opengl3-binding]",
"glfw3",
"opengl"
]
}
System Requirements (Linux)
# PortAudio dependencies
sudo apt install libasound2-dev
# OpenGL dependencies
sudo apt install libgl1-mesa-dev libglu1-mesa-dev
⚙️ Configuration
.env (racine projet)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
config.json (racine projet)
{
"audio": {
"sample_rate": 16000,
"channels": 1,
"chunk_duration_seconds": 10,
"format": "wav"
},
"whisper": {
"model": "whisper-1",
"language": "zh",
"temperature": 0.0
},
"claude": {
"model": "claude-haiku-4-20250514",
"max_tokens": 1024,
"temperature": 0.3,
"system_prompt": "Tu es un traducteur professionnel chinois-français. Traduis le texte suivant de manière naturelle et contextuelle."
},
"ui": {
"window_width": 800,
"window_height": 600,
"font_size": 16,
"max_display_lines": 50
},
"recording": {
"save_audio": true,
"output_directory": "./recordings"
}
}
🔌 API Clients
Whisper API
// POST https://api.openai.com/v1/audio/transcriptions
// Content-Type: multipart/form-data
Request:
- file: audio.wav (binary)
- model: whisper-1
- language: zh
- temperature: 0.0
Response:
{
"text": "你好,今天我们讨论项目进度..."
}
Claude API
// POST https://api.anthropic.com/v1/messages
// Content-Type: application/json
// x-api-key: {ANTHROPIC_API_KEY}
// anthropic-version: 2023-06-01
Request:
{
"model": "claude-haiku-4-20250514",
"max_tokens": 1024,
"messages": [{
"role": "user",
"content": "Traduis en français: 你好,今天我们讨论项目进度..."
}]
}
Response:
{
"content": [{
"type": "text",
"text": "Bonjour, aujourd'hui nous discutons de l'avancement du projet..."
}],
"model": "claude-haiku-4-20250514",
"usage": {...}
}
🎨 Interface ImGui
Layout Minimaliste
┌────────────────────────────────────────────┐
│ SecondVoice - Live Translation │
├────────────────────────────────────────────┤
│ │
│ [●] Recording... Duration: 00:05:23 │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ 中文: 你好,今天我们讨论项目进度... │ │
│ │ FR: Bonjour, aujourd'hui nous │ │
│ │ discutons de l'avancement... │ │
│ │ │ │
│ │ 中文: 关于预算的问题... │ │
│ │ FR: Concernant la question du budget.. │ │
│ │ │ │
│ │ [Auto-scroll enabled] │ │
│ │ │ │
│ └────────────────────────────────────────┘ │
│ │
│ [ STOP RECORDING ] │
│ │
│ Status: Processing chunk 12/12 │
│ Audio: 16kHz mono, chunk size: 10s │
└────────────────────────────────────────────┘
Features UI
- Scrollable text area: Auto-scroll, peut désactiver pour review
- Color coding: Chinois (couleur 1), Français (couleur 2)
- Status bar: Duration, chunk count, processing status
- Stop button: Arrête capture + processing, sauvegarde audio
- Window resizable: Layout adaptatif
🚀 Ordre d'Implémentation
Phase 1 - Setup Infrastructure (Jour 1)
Todo:
- ✅ Créer structure projet
- ✅ Setup CMakeLists.txt avec vcpkg
- ✅ Créer .gitignore (.env, build/, recordings/)
- ✅ Créer config.json template
- ✅ Setup .env (API keys)
- ✅ Test build minimal (hello world)
Validation: cmake -B build && cmake --build build compile sans erreurs
Phase 2 - Audio Capture (Jour 1-2)
Todo:
- Implémenter
AudioCapture.h/cpp:- Init PortAudio
- Callback capture audio
- Accumulation chunks (configurable duration)
- Push dans ThreadSafeQueue
- Implémenter
AudioBuffer.h/cpp:- Ring buffer pour audio raw
- Thread-safe operations
- Test standalone: Capture 30s audio → save WAV
Validation: Audio WAV lisible, durée correcte, qualité OK
Phase 3 - Whisper Client (Jour 2)
Todo:
- Implémenter
WhisperClient.h/cpp:- Load API key depuis .env
- POST multipart/form-data (cpp-httplib)
- Encode audio WAV en memory
- Parse JSON response
- Error handling (retry, timeout)
- Test standalone: Audio file → Whisper → texte chinois
Validation: Transcription chinoise correcte sur sample audio
Phase 4 - Claude Client (Jour 2-3)
Todo:
- Implémenter
ClaudeClient.h/cpp:- Load API key depuis .env
- POST JSON request (cpp-httplib)
- System prompt configurable
- Parse response (extract text)
- Error handling
- Test standalone: Texte chinois → Claude → texte français
Validation: Traduction française naturelle et correcte
Phase 5 - ImGui UI (Jour 3)
Todo:
- Setup ImGui + GLFW + OpenGL:
- Window creation
- Render loop
- Input handling
- Implémenter
TranslationUI.h/cpp:- Scrollable text area
- Display messages (CN + FR)
- Button Stop
- Status bar (duration, chunk count)
- Test standalone: Afficher mock data
Validation: UI responsive, affichage texte OK, bouton fonctionne
Phase 6 - Pipeline Integration (Jour 4)
Todo:
- Implémenter
Pipeline.h/cpp:- Thread 1: AudioCapture loop
- Thread 2: Processing loop (Whisper → Claude)
- Thread 3: UI loop (ImGui)
- ThreadSafeQueue entre threads
- Synchronisation (start/stop)
- Implémenter
Config.h/cpp:- Load .env (API keys)
- Load config.json (settings)
- Implémenter
main.cpp:- Init all components
- Start pipeline
- Handle graceful shutdown
Validation: Pipeline complet fonctionne bout-à-bout
Phase 7 - Testing & Tuning (Jour 5)
Todo:
- Test avec audio réel chinois:
- Sample conversations
- Different audio qualities
- Different chunk sizes (5s, 10s, 30s)
- Measure latence:
- Audio → Whisper: X secondes
- Whisper → Claude: Y secondes
- Total: Z secondes
- Debug & fix bugs:
- Memory leaks
- Thread safety issues
- API errors handling
- Optimize:
- Chunk size optimal (tradeoff latency vs accuracy)
- API timeout values
- UI refresh rate
Validation:
- Latence totale < 10s acceptable
- Pas de crash sur 30min recording
- Transcription + traduction compréhensibles
🧪 Test Plan
Unit Tests (Phase 2+)
AudioCapture: Capture audio, format correctWhisperClient: API call mock, parsing JSONClaudeClient: API call mock, parsing JSONThreadSafeQueue: Thread safety, no data loss
Integration Tests
- Audio → Whisper: Audio file → texte chinois correct
- Whisper → Claude: Texte chinois → traduction française correcte
- Pipeline: Audio → UI display complet
End-to-End Test
- Recording 5min conversation chinoise réelle
- Vérifier transcription accuracy (>85%)
- Vérifier traduction compréhensible
- Vérifier UI responsive
- Vérifier audio sauvegardé correctement
📊 Metrics à Tracker
Performance
- Latence Whisper: Temps API call (target: <3s pour 10s audio)
- Latence Claude: Temps API call (target: <2s pour 200 tokens)
- Latence totale: Audio → Display (target: <10s)
- Memory usage: Stable sur longue durée (no leaks)
- CPU usage: Acceptable (<50% sur laptop)
Qualité
- Whisper accuracy: % mots corrects (target: >85%)
- Claude quality: Traduction naturelle (subjective)
- Crash rate: 0 crash sur 1h recording
Cost
- Whisper: $0.006/min audio
- Claude: ~$0.03-0.05/h (depends on text volume)
- Total: ~$0.40/h meeting
⚠️ Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Whisper API timeout | Bloquant | Retry logic, timeout 30s, fallback queue |
| Claude API rate limit | Moyen | Exponential backoff, queue requests |
| Audio buffer overflow | Moyen | Ring buffer size adequate, drop old chunks if needed |
| Thread deadlock | Bloquant | Use std::lock_guard, avoid nested locks |
| Memory leak | Moyen | Use smart pointers, valgrind tests |
| Network interruption | Moyen | Retry logic, cache audio locally |
🎯 Success Criteria MVP
✅ MVP validé si:
- Capture audio microphone fonctionne
- Transcription chinoise >85% précise
- Traduction française compréhensible
- UI affiche traductions temps réel
- Bouton Stop arrête proprement
- Audio sauvegardé correctement
- Pas de crash sur 30min recording
- Latence totale <10s acceptable
📝 Notes Implémentation
Thread Safety
- Utiliser
std::mutex+std::lock_guardpour queues - Pas de shared state sans protection
- Use
std::atomic<bool>pour flags (running, stopping)
Error Handling
- Try/catch sur API calls
- Log errors (spdlog ou simple cout)
- Retry logic (max 3 attempts)
- Graceful degradation (skip chunk si error persistant)
Audio Format
- Sample rate: 16kHz (optimal pour Whisper)
- Channels: Mono (sufficient, réduit bandwidth)
- Format: 16-bit PCM WAV
- Chunk size: Configurable (default 10s)
API Best Practices
- Timeout: 30s pour Whisper, 15s pour Claude
- Retry: Exponential backoff (1s, 2s, 4s)
- Rate limiting: Respect API limits (monitor 429 errors)
- Headers: Always set User-Agent, API version
🔄 Post-MVP (Phase 2)
Not included in MVP, but planned:
- ❌ Résumé auto post-meeting (Claude summary)
- ❌ Export structuré (transcripts + audio)
- ❌ Système de recherche (backlog)
- ❌ Diarization (qui parle)
- ❌ Replay mode
- ❌ GUI élaborée (settings, etc)
Focus MVP: Pipeline fonctionnel bout-à-bout, validation concept, usage réel premier meeting.
Document créé: 20 novembre 2025 Status: Ready to implement Estimated effort: 5 jours développement + 2 jours tests