secondvoice/docs/implementation_plan.md

# SecondVoice - Plan d'Implémentation MVP

**Date**: 20 novembre 2025
**Target**: MVP minimal fonctionnel
**Platform**: Linux
**Package Manager**: vcpkg

---

## 🎯 Objectif MVP Minimal

Application desktop qui:
1. Capture audio microphone en continu
2. Transcrit chinois → texte (Whisper API)
3. Traduit texte → français (Claude API)
4. Affiche traduction temps réel (ImGui)
5. Bouton Stop pour arrêter (pas de résumé MVP)

---

## 🏗️ Architecture Technique

### Pipeline
```
Audio Capture (PortAudio)
    ↓ (chunks audio configurables)
Whisper API (STT)
    ↓ (texte chinois)
Claude API (traduction)
    ↓ (texte français)
ImGui UI (display temps réel + bouton Stop)
```

### Threading Model
```
Thread 1 - Audio Capture:
  - PortAudio callback capture audio
  - Accumule chunks (taille configurable)
  - Push dans queue thread-safe
  - Save WAV backup en background

Thread 2 - AI Processing:
  - Pop chunk depuis audio queue
  - POST Whisper API → transcription chinoise
  - POST Claude API → traduction française
  - Push résultat dans UI queue

Thread 3 - Main UI (ImGui):
  - Render window ImGui
  - Display traductions depuis queue
  - Handle bouton Stop
  - Update status/duration
```

---

## 📁 Structure Projet

```
secondvoice/
├── .env                            # API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY)
├── .gitignore
├── CMakeLists.txt                  # Build configuration
├── vcpkg.json                      # Dependencies manifest
├── config.json                     # Runtime config (audio chunk size, etc)
├── README.md
├── docs/
│   ├── SecondVoice.md             # Vision document
│   └── implementation_plan.md      # Ce document
├── src/
│   ├── main.cpp                    # Entry point + ImGui main loop
│   ├── audio/
│   │   ├── AudioCapture.h
│   │   ├── AudioCapture.cpp        # PortAudio wrapper
│   │   ├── AudioBuffer.h
│   │   └── AudioBuffer.cpp         # Thread-safe ring buffer
│   ├── api/
│   │   ├── WhisperClient.h
│   │   ├── WhisperClient.cpp       # Whisper API client
│   │   ├── ClaudeClient.h
│   │   └── ClaudeClient.cpp        # Claude API client
│   ├── ui/
│   │   ├── TranslationUI.h
│   │   └── TranslationUI.cpp       # ImGui interface
│   ├── utils/
│   │   ├── Config.h
│   │   ├── Config.cpp              # Load .env + config.json
│   │   ├── ThreadSafeQueue.h       # Template queue thread-safe
│   │   └── Logger.h                # Simple logging
│   └── core/
│       ├── Pipeline.h
│       └── Pipeline.cpp            # Orchestrate threads
├── recordings/                     # Output audio files
│   └── .gitkeep
└── build/                          # CMake build output (ignored)
```

---

## 🔧 Dépendances

### vcpkg.json
```json
{
  "name": "secondvoice",
  "version": "0.1.0",
  "dependencies": [
    "portaudio",
    "cpp-httplib",
    "nlohmann-json",
    "imgui[glfw-binding,opengl3-binding]",
    "glfw3",
    "opengl"
  ]
}
```

### System Requirements (Linux)
```bash
# PortAudio dependencies
sudo apt install libasound2-dev

# OpenGL dependencies
sudo apt install libgl1-mesa-dev libglu1-mesa-dev
```

---

## ⚙️ Configuration

### .env (racine projet)
```env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
```

### config.json (racine projet)
```json
{
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10,
    "format": "wav"
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh",
    "temperature": 0.0
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024,
    "temperature": 0.3,
    "system_prompt": "Tu es un traducteur professionnel chinois-français. Traduis le texte suivant de manière naturelle et contextuelle."
  },
  "ui": {
    "window_width": 800,
    "window_height": 600,
    "font_size": 16,
    "max_display_lines": 50
  },
  "recording": {
    "save_audio": true,
    "output_directory": "./recordings"
  }
}
```

---

## 🔌 API Clients

### Whisper API
```cpp
// POST https://api.openai.com/v1/audio/transcriptions
// Content-Type: multipart/form-data

Request:
- file: audio.wav (binary)
- model: whisper-1
- language: zh
- temperature: 0.0

Response:
{
  "text": "你好，今天我们讨论项目进度..."
}
```

### Claude API
```cpp
// POST https://api.anthropic.com/v1/messages
// Content-Type: application/json
// x-api-key: {ANTHROPIC_API_KEY}
// anthropic-version: 2023-06-01

Request:
{
  "model": "claude-haiku-4-20250514",
  "max_tokens": 1024,
  "messages": [{
    "role": "user",
    "content": "Traduis en français: 你好，今天我们讨论项目进度..."
  }]
}

Response:
{
  "content": [{
    "type": "text",
    "text": "Bonjour, aujourd'hui nous discutons de l'avancement du projet..."
  }],
  "model": "claude-haiku-4-20250514",
  "usage": {...}
}
```

---

## 🎨 Interface ImGui

### Layout Minimaliste
```
┌────────────────────────────────────────────┐
│ SecondVoice - Live Translation             │
├────────────────────────────────────────────┤
│                                            │
│ [●] Recording...    Duration: 00:05:23     │
│                                            │
│ ┌────────────────────────────────────────┐ │
│ │ 中文: 你好，今天我们讨论项目进度...    │ │
│ │ FR: Bonjour, aujourd'hui nous          │ │
│ │     discutons de l'avancement...       │ │
│ │                                        │ │
│ │ 中文: 关于预算的问题...                │ │
│ │ FR: Concernant la question du budget.. │ │
│ │                                        │ │
│ │ [Auto-scroll enabled]                  │ │
│ │                                        │ │
│ └────────────────────────────────────────┘ │
│                                            │
│         [    STOP RECORDING    ]           │
│                                            │
│ Status: Processing chunk 12/12             │
│ Audio: 16kHz mono, chunk size: 10s         │
└────────────────────────────────────────────┘
```

### Features UI
- **Scrollable text area**: Auto-scroll, peut désactiver pour review
- **Color coding**: Chinois (couleur 1), Français (couleur 2)
- **Status bar**: Duration, chunk count, processing status
- **Stop button**: Arrête capture + processing, sauvegarde audio
- **Window resizable**: Layout adaptatif

---

## 🚀 Ordre d'Implémentation

### Phase 1 - Setup Infrastructure (Jour 1)
**Todo**:
1. ✅ Créer structure projet
2. ✅ Setup CMakeLists.txt avec vcpkg
3. ✅ Créer .gitignore (.env, build/, recordings/)
4. ✅ Créer config.json template
5. ✅ Setup .env (API keys)
6. ✅ Test build minimal (hello world)

**Validation**: `cmake -B build && cmake --build build` compile sans erreurs

---

### Phase 2 - Audio Capture (Jour 1-2)
**Todo**:
1. Implémenter `AudioCapture.h/cpp`:
   - Init PortAudio
   - Callback capture audio
   - Accumulation chunks (configurable duration)
   - Push dans ThreadSafeQueue
2. Implémenter `AudioBuffer.h/cpp`:
   - Ring buffer pour audio raw
   - Thread-safe operations
3. Test standalone: Capture 30s audio → save WAV

**Validation**: Audio WAV lisible, durée correcte, qualité OK

---

### Phase 3 - Whisper Client (Jour 2)
**Todo**:
1. Implémenter `WhisperClient.h/cpp`:
   - Load API key depuis .env
   - POST multipart/form-data (cpp-httplib)
   - Encode audio WAV en memory
   - Parse JSON response
   - Error handling (retry, timeout)
2. Test standalone: Audio file → Whisper → texte chinois

**Validation**: Transcription chinoise correcte sur sample audio

---

### Phase 4 - Claude Client (Jour 2-3)
**Todo**:
1. Implémenter `ClaudeClient.h/cpp`:
   - Load API key depuis .env
   - POST JSON request (cpp-httplib)
   - System prompt configurable
   - Parse response (extract text)
   - Error handling
2. Test standalone: Texte chinois → Claude → texte français

**Validation**: Traduction française naturelle et correcte

---

### Phase 5 - ImGui UI (Jour 3)
**Todo**:
1. Setup ImGui + GLFW + OpenGL:
   - Window creation
   - Render loop
   - Input handling
2. Implémenter `TranslationUI.h/cpp`:
   - Scrollable text area
   - Display messages (CN + FR)
   - Button Stop
   - Status bar (duration, chunk count)
3. Test standalone: Afficher mock data

**Validation**: UI responsive, affichage texte OK, bouton fonctionne

---

### Phase 6 - Pipeline Integration (Jour 4)
**Todo**:
1. Implémenter `Pipeline.h/cpp`:
   - Thread 1: AudioCapture loop
   - Thread 2: Processing loop (Whisper → Claude)
   - Thread 3: UI loop (ImGui)
   - ThreadSafeQueue entre threads
   - Synchronisation (start/stop)
2. Implémenter `Config.h/cpp`:
   - Load .env (API keys)
   - Load config.json (settings)
3. Implémenter `main.cpp`:
   - Init all components
   - Start pipeline
   - Handle graceful shutdown

**Validation**: Pipeline complet fonctionne bout-à-bout

---

### Phase 7 - Testing & Tuning (Jour 5)
**Todo**:
1. Test avec audio réel chinois:
   - Sample conversations
   - Different audio qualities
   - Different chunk sizes (5s, 10s, 30s)
2. Measure latence:
   - Audio → Whisper: X secondes
   - Whisper → Claude: Y secondes
   - Total: Z secondes
3. Debug & fix bugs:
   - Memory leaks
   - Thread safety issues
   - API errors handling
4. Optimize:
   - Chunk size optimal (tradeoff latency vs accuracy)
   - API timeout values
   - UI refresh rate

**Validation**:
- Latence totale < 10s acceptable
- Pas de crash sur 30min recording
- Transcription + traduction compréhensibles

---

## 🧪 Test Plan

### Unit Tests (Phase 2+)
- `AudioCapture`: Capture audio, format correct
- `WhisperClient`: API call mock, parsing JSON
- `ClaudeClient`: API call mock, parsing JSON
- `ThreadSafeQueue`: Thread safety, no data loss

### Integration Tests
- Audio → Whisper: Audio file → texte chinois correct
- Whisper → Claude: Texte chinois → traduction française correcte
- Pipeline: Audio → UI display complet

### End-to-End Test
- Recording 5min conversation chinoise réelle
- Vérifier transcription accuracy (>85%)
- Vérifier traduction compréhensible
- Vérifier UI responsive
- Vérifier audio sauvegardé correctement

---

## 📊 Metrics à Tracker

### Performance
- **Latence Whisper**: Temps API call (target: <3s pour 10s audio)
- **Latence Claude**: Temps API call (target: <2s pour 200 tokens)
- **Latence totale**: Audio → Display (target: <10s)
- **Memory usage**: Stable sur longue durée (no leaks)
- **CPU usage**: Acceptable (<50% sur laptop)

### Qualité
- **Whisper accuracy**: % mots corrects (target: >85%)
- **Claude quality**: Traduction naturelle (subjective)
- **Crash rate**: 0 crash sur 1h recording

### Cost
- **Whisper**: $0.006/min audio
- **Claude**: ~$0.03-0.05/h (depends on text volume)
- **Total**: ~$0.40/h meeting

---

## ⚠️ Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| **Whisper API timeout** | Bloquant | Retry logic, timeout 30s, fallback queue |
| **Claude API rate limit** | Moyen | Exponential backoff, queue requests |
| **Audio buffer overflow** | Moyen | Ring buffer size adequate, drop old chunks if needed |
| **Thread deadlock** | Bloquant | Use std::lock_guard, avoid nested locks |
| **Memory leak** | Moyen | Use smart pointers, valgrind tests |
| **Network interruption** | Moyen | Retry logic, cache audio locally |

---

## 🎯 Success Criteria MVP

✅ **MVP validé si**:
1. Capture audio microphone fonctionne
2. Transcription chinoise >85% précise
3. Traduction française compréhensible
4. UI affiche traductions temps réel
5. Bouton Stop arrête proprement
6. Audio sauvegardé correctement
7. Pas de crash sur 30min recording
8. Latence totale <10s acceptable

---

## 📝 Notes Implémentation

### Thread Safety
- Utiliser `std::mutex` + `std::lock_guard` pour queues
- Pas de shared state sans protection
- Use `std::atomic<bool>` pour flags (running, stopping)

### Error Handling
- Try/catch sur API calls
- Log errors (spdlog ou simple cout)
- Retry logic (max 3 attempts)
- Graceful degradation (skip chunk si error persistant)

### Audio Format
- **Sample rate**: 16kHz (optimal pour Whisper)
- **Channels**: Mono (sufficient, réduit bandwidth)
- **Format**: 16-bit PCM WAV
- **Chunk size**: Configurable (default 10s)

### API Best Practices
- **Timeout**: 30s pour Whisper, 15s pour Claude
- **Retry**: Exponential backoff (1s, 2s, 4s)
- **Rate limiting**: Respect API limits (monitor 429 errors)
- **Headers**: Always set User-Agent, API version

---

## 🔄 Post-MVP (Phase 2)

**Not included in MVP, but planned**:
- ❌ Résumé auto post-meeting (Claude summary)
- ❌ Export structuré (transcripts + audio)
- ❌ Système de recherche (backlog)
- ❌ Diarization (qui parle)
- ❌ Replay mode
- ❌ GUI élaborée (settings, etc)

**Focus MVP**: Pipeline fonctionnel bout-à-bout, validation concept, usage réel premier meeting.

---

*Document créé: 20 novembre 2025*
*Status: Ready to implement*
*Estimated effort: 5 jours développement + 2 jours tests*