StillHammer 5b60acaa73 feat: Implement complete MVP architecture for SecondVoice

Complete implementation of the real-time Chinese-to-French translation system:

Architecture:
- 3-threaded pipeline: Audio capture → AI processing → UI rendering
- Thread-safe queues for inter-thread communication
- Configurable audio chunk sizes for latency tuning

Core Features:
- Audio capture with PortAudio (configurable sample rate/channels)
- Whisper API integration for Chinese speech-to-text
- Claude API integration for Chinese-to-French translation
- ImGui real-time display with stop button
- Full recording saved to WAV on stop

Modules Implemented:
- audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export)
- api/: WhisperClient + ClaudeClient (HTTP API wrappers)
- ui/: TranslationUI (ImGui interface)
- core/: Pipeline (orchestrates all threads)
- utils/: Config (JSON/.env loader) + ThreadSafeQueue (template)

Build System:
- CMake with vcpkg for dependency management
- vcpkg.json manifest for reproducible builds
- build.sh helper script

Configuration:
- config.json: Audio settings, API parameters, UI config
- .env: API keys (OpenAI + Anthropic)

Documentation:
- README.md: Setup instructions, usage, architecture
- docs/implementation_plan.md: Technical design document
- docs/SecondVoice.md: Project vision and motivation

Next Steps:
- Test build with vcpkg dependencies
- Test audio capture on real hardware
- Validate API integrations
- Tune chunk size for optimal latency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-20 03:08:03 +08:00

14 KiB

Raw Blame History

SecondVoice - Plan d'Implémentation MVP

Date: 20 novembre 2025 Target: MVP minimal fonctionnel Platform: Linux Package Manager: vcpkg

🎯 Objectif MVP Minimal

Application desktop qui:

Capture audio microphone en continu
Transcrit chinois → texte (Whisper API)
Traduit texte → français (Claude API)
Affiche traduction temps réel (ImGui)
Bouton Stop pour arrêter (pas de résumé MVP)

🏗️ Architecture Technique

Pipeline

Audio Capture (PortAudio)
    ↓ (chunks audio configurables)
Whisper API (STT)
    ↓ (texte chinois)
Claude API (traduction)
    ↓ (texte français)
ImGui UI (display temps réel + bouton Stop)

Threading Model

Thread 1 - Audio Capture:
  - PortAudio callback capture audio
  - Accumule chunks (taille configurable)
  - Push dans queue thread-safe
  - Save WAV backup en background

Thread 2 - AI Processing:
  - Pop chunk depuis audio queue
  - POST Whisper API → transcription chinoise
  - POST Claude API → traduction française
  - Push résultat dans UI queue

Thread 3 - Main UI (ImGui):
  - Render window ImGui
  - Display traductions depuis queue
  - Handle bouton Stop
  - Update status/duration

📁 Structure Projet

secondvoice/
├── .env                            # API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY)
├── .gitignore
├── CMakeLists.txt                  # Build configuration
├── vcpkg.json                      # Dependencies manifest
├── config.json                     # Runtime config (audio chunk size, etc)
├── README.md
├── docs/
│   ├── SecondVoice.md             # Vision document
│   └── implementation_plan.md      # Ce document
├── src/
│   ├── main.cpp                    # Entry point + ImGui main loop
│   ├── audio/
│   │   ├── AudioCapture.h
│   │   ├── AudioCapture.cpp        # PortAudio wrapper
│   │   ├── AudioBuffer.h
│   │   └── AudioBuffer.cpp         # Thread-safe ring buffer
│   ├── api/
│   │   ├── WhisperClient.h
│   │   ├── WhisperClient.cpp       # Whisper API client
│   │   ├── ClaudeClient.h
│   │   └── ClaudeClient.cpp        # Claude API client
│   ├── ui/
│   │   ├── TranslationUI.h
│   │   └── TranslationUI.cpp       # ImGui interface
│   ├── utils/
│   │   ├── Config.h
│   │   ├── Config.cpp              # Load .env + config.json
│   │   ├── ThreadSafeQueue.h       # Template queue thread-safe
│   │   └── Logger.h                # Simple logging
│   └── core/
│       ├── Pipeline.h
│       └── Pipeline.cpp            # Orchestrate threads
├── recordings/                     # Output audio files
│   └── .gitkeep
└── build/                          # CMake build output (ignored)

🔧 Dépendances

vcpkg.json

{
  "name": "secondvoice",
  "version": "0.1.0",
  "dependencies": [
    "portaudio",
    "cpp-httplib",
    "nlohmann-json",
    "imgui[glfw-binding,opengl3-binding]",
    "glfw3",
    "opengl"
  ]
}

System Requirements (Linux)

# PortAudio dependencies
sudo apt install libasound2-dev

# OpenGL dependencies
sudo apt install libgl1-mesa-dev libglu1-mesa-dev

⚙️ Configuration

.env (racine projet)

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

config.json (racine projet)

{
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10,
    "format": "wav"
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh",
    "temperature": 0.0
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024,
    "temperature": 0.3,
    "system_prompt": "Tu es un traducteur professionnel chinois-français. Traduis le texte suivant de manière naturelle et contextuelle."
  },
  "ui": {
    "window_width": 800,
    "window_height": 600,
    "font_size": 16,
    "max_display_lines": 50
  },
  "recording": {
    "save_audio": true,
    "output_directory": "./recordings"
  }
}

🔌 API Clients

Whisper API

// POST https://api.openai.com/v1/audio/transcriptions
// Content-Type: multipart/form-data

Request:
- file: audio.wav (binary)
- model: whisper-1
- language: zh
- temperature: 0.0

Response:
{
  "text": "你好，今天我们讨论项目进度..."
}

Claude API

// POST https://api.anthropic.com/v1/messages
// Content-Type: application/json
// x-api-key: {ANTHROPIC_API_KEY}
// anthropic-version: 2023-06-01

Request:
{
  "model": "claude-haiku-4-20250514",
  "max_tokens": 1024,
  "messages": [{
    "role": "user",
    "content": "Traduis en français: 你好，今天我们讨论项目进度..."
  }]
}

Response:
{
  "content": [{
    "type": "text",
    "text": "Bonjour, aujourd'hui nous discutons de l'avancement du projet..."
  }],
  "model": "claude-haiku-4-20250514",
  "usage": {...}
}

🎨 Interface ImGui

Layout Minimaliste

┌────────────────────────────────────────────┐
│ SecondVoice - Live Translation             │
├────────────────────────────────────────────┤
│                                            │
│ [●] Recording...    Duration: 00:05:23     │
│                                            │
│ ┌────────────────────────────────────────┐ │
│ │ 中文: 你好，今天我们讨论项目进度...    │ │
│ │ FR: Bonjour, aujourd'hui nous          │ │
│ │     discutons de l'avancement...       │ │
│ │                                        │ │
│ │ 中文: 关于预算的问题...                │ │
│ │ FR: Concernant la question du budget.. │ │
│ │                                        │ │
│ │ [Auto-scroll enabled]                  │ │
│ │                                        │ │
│ └────────────────────────────────────────┘ │
│                                            │
│         [    STOP RECORDING    ]           │
│                                            │
│ Status: Processing chunk 12/12             │
│ Audio: 16kHz mono, chunk size: 10s         │
└────────────────────────────────────────────┘

Features UI

Scrollable text area: Auto-scroll, peut désactiver pour review
Color coding: Chinois (couleur 1), Français (couleur 2)
Status bar: Duration, chunk count, processing status
Stop button: Arrête capture + processing, sauvegarde audio
Window resizable: Layout adaptatif

🚀 Ordre d'Implémentation

Phase 1 - Setup Infrastructure (Jour 1)

Todo:

✅ Créer structure projet
✅ Setup CMakeLists.txt avec vcpkg
✅ Créer .gitignore (.env, build/, recordings/)
✅ Créer config.json template
✅ Setup .env (API keys)
✅ Test build minimal (hello world)

Validation: cmake -B build && cmake --build build compile sans erreurs

Phase 2 - Audio Capture (Jour 1-2)

Todo:

Implémenter AudioCapture.h/cpp:
- Init PortAudio
- Callback capture audio
- Accumulation chunks (configurable duration)
- Push dans ThreadSafeQueue
Implémenter AudioBuffer.h/cpp:
- Ring buffer pour audio raw
- Thread-safe operations
Test standalone: Capture 30s audio → save WAV

Validation: Audio WAV lisible, durée correcte, qualité OK

Phase 3 - Whisper Client (Jour 2)

Todo:

Implémenter WhisperClient.h/cpp:
- Load API key depuis .env
- POST multipart/form-data (cpp-httplib)
- Encode audio WAV en memory
- Parse JSON response
- Error handling (retry, timeout)
Test standalone: Audio file → Whisper → texte chinois

Validation: Transcription chinoise correcte sur sample audio

Phase 4 - Claude Client (Jour 2-3)

Todo:

Implémenter ClaudeClient.h/cpp:
- Load API key depuis .env
- POST JSON request (cpp-httplib)
- System prompt configurable
- Parse response (extract text)
- Error handling
Test standalone: Texte chinois → Claude → texte français

Validation: Traduction française naturelle et correcte

Phase 5 - ImGui UI (Jour 3)

Todo:

Setup ImGui + GLFW + OpenGL:
- Window creation
- Render loop
- Input handling
Implémenter TranslationUI.h/cpp:
- Scrollable text area
- Display messages (CN + FR)
- Button Stop
- Status bar (duration, chunk count)
Test standalone: Afficher mock data

Validation: UI responsive, affichage texte OK, bouton fonctionne

Phase 6 - Pipeline Integration (Jour 4)

Todo:

Implémenter Pipeline.h/cpp:
- Thread 1: AudioCapture loop
- Thread 2: Processing loop (Whisper → Claude)
- Thread 3: UI loop (ImGui)
- ThreadSafeQueue entre threads
- Synchronisation (start/stop)
Implémenter Config.h/cpp:
- Load .env (API keys)
- Load config.json (settings)
Implémenter main.cpp:
- Init all components
- Start pipeline
- Handle graceful shutdown

Validation: Pipeline complet fonctionne bout-à-bout

Phase 7 - Testing & Tuning (Jour 5)

Todo:

Test avec audio réel chinois:
- Sample conversations
- Different audio qualities
- Different chunk sizes (5s, 10s, 30s)
Measure latence:
- Audio → Whisper: X secondes
- Whisper → Claude: Y secondes
- Total: Z secondes
Debug & fix bugs:
- Memory leaks
- Thread safety issues
- API errors handling
Optimize:
- Chunk size optimal (tradeoff latency vs accuracy)
- API timeout values
- UI refresh rate

Validation:

Latence totale < 10s acceptable
Pas de crash sur 30min recording
Transcription + traduction compréhensibles

🧪 Test Plan

Unit Tests (Phase 2+)

AudioCapture: Capture audio, format correct
WhisperClient: API call mock, parsing JSON
ClaudeClient: API call mock, parsing JSON
ThreadSafeQueue: Thread safety, no data loss

Integration Tests

Audio → Whisper: Audio file → texte chinois correct
Whisper → Claude: Texte chinois → traduction française correcte
Pipeline: Audio → UI display complet

End-to-End Test

Recording 5min conversation chinoise réelle
Vérifier transcription accuracy (>85%)
Vérifier traduction compréhensible
Vérifier UI responsive
Vérifier audio sauvegardé correctement

📊 Metrics à Tracker

Performance

Latence Whisper: Temps API call (target: <3s pour 10s audio)
Latence Claude: Temps API call (target: <2s pour 200 tokens)
Latence totale: Audio → Display (target: <10s)
Memory usage: Stable sur longue durée (no leaks)
CPU usage: Acceptable (<50% sur laptop)

Qualité

Whisper accuracy: % mots corrects (target: >85%)
Claude quality: Traduction naturelle (subjective)
Crash rate: 0 crash sur 1h recording

Cost

Whisper: $0.006/min audio
Claude: ~$0.03-0.05/h (depends on text volume)
Total: ~$0.40/h meeting

⚠️ Risks & Mitigations

Risk	Impact	Mitigation
Whisper API timeout	Bloquant	Retry logic, timeout 30s, fallback queue
Claude API rate limit	Moyen	Exponential backoff, queue requests
Audio buffer overflow	Moyen	Ring buffer size adequate, drop old chunks if needed
Thread deadlock	Bloquant	Use std::lock_guard, avoid nested locks
Memory leak	Moyen	Use smart pointers, valgrind tests
Network interruption	Moyen	Retry logic, cache audio locally

🎯 Success Criteria MVP

✅ MVP validé si:

Capture audio microphone fonctionne
Transcription chinoise >85% précise
Traduction française compréhensible
UI affiche traductions temps réel
Bouton Stop arrête proprement
Audio sauvegardé correctement
Pas de crash sur 30min recording
Latence totale <10s acceptable

📝 Notes Implémentation

Thread Safety

Utiliser std::mutex + std::lock_guard pour queues
Pas de shared state sans protection
Use std::atomic<bool> pour flags (running, stopping)

Error Handling

Try/catch sur API calls
Log errors (spdlog ou simple cout)
Retry logic (max 3 attempts)
Graceful degradation (skip chunk si error persistant)

Audio Format

Sample rate: 16kHz (optimal pour Whisper)
Channels: Mono (sufficient, réduit bandwidth)
Format: 16-bit PCM WAV
Chunk size: Configurable (default 10s)

API Best Practices

Timeout: 30s pour Whisper, 15s pour Claude
Retry: Exponential backoff (1s, 2s, 4s)
Rate limiting: Respect API limits (monitor 429 errors)
Headers: Always set User-Agent, API version

🔄 Post-MVP (Phase 2)

Not included in MVP, but planned:

❌ Résumé auto post-meeting (Claude summary)
❌ Export structuré (transcripts + audio)
❌ Système de recherche (backlog)
❌ Diarization (qui parle)
❌ Replay mode
❌ GUI élaborée (settings, etc)

Focus MVP: Pipeline fonctionnel bout-à-bout, validation concept, usage réel premier meeting.

Document créé: 20 novembre 2025 Status: Ready to implement Estimated effort: 5 jours développement + 2 jours tests

14 KiB Raw Blame History

SecondVoice - Plan d'Implémentation MVP

🎯 Objectif MVP Minimal

🏗️ Architecture Technique

Pipeline

Threading Model

📁 Structure Projet

🔧 Dépendances

vcpkg.json

System Requirements (Linux)

⚙️ Configuration

.env (racine projet)

config.json (racine projet)

🔌 API Clients

Whisper API

Claude API

🎨 Interface ImGui

Layout Minimaliste

Features UI

🚀 Ordre d'Implémentation

Phase 1 - Setup Infrastructure (Jour 1)

Phase 2 - Audio Capture (Jour 1-2)

Phase 3 - Whisper Client (Jour 2)

Phase 4 - Claude Client (Jour 2-3)

Phase 5 - ImGui UI (Jour 3)

Phase 6 - Pipeline Integration (Jour 4)

Phase 7 - Testing & Tuning (Jour 5)

🧪 Test Plan

Unit Tests (Phase 2+)

Integration Tests

End-to-End Test

📊 Metrics à Tracker

Performance

Qualité

Cost

⚠️ Risks & Mitigations

🎯 Success Criteria MVP

📝 Notes Implémentation

Thread Safety

Error Handling

Audio Format

API Best Practices

🔄 Post-MVP (Phase 2)

14 KiB

Raw Blame History