feat: Implement complete MVP architecture for SecondVoice

Complete implementation of the real-time Chinese-to-French translation system: Architecture: - 3-threaded pipeline: Audio capture → AI processing → UI rendering - Thread-safe queues for inter-thread communication - Configurable audio chunk sizes for latency tuning Core Features: - Audio capture with PortAudio (configurable sample rate/channels) - Whisper API integration for Chinese speech-to-text - Claude API integration for Chinese-to-French translation - ImGui real-time display with stop button - Full recording saved to WAV on stop Modules Implemented: - audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export) - api/: WhisperClient + ClaudeClient (HTTP API wrappers) - ui/: TranslationUI (ImGui interface) - core/: Pipeline (orchestrates all threads) - utils/: Config (JSON/.env loader) + ThreadSafeQueue (template) Build System: - CMake with vcpkg for dependency management - vcpkg.json manifest for reproducible builds - build.sh helper script Configuration: - config.json: Audio settings, API parameters, UI config - .env: API keys (OpenAI + Anthropic) Documentation: - README.md: Setup instructions, usage, architecture - docs/implementation_plan.md: Technical design document - docs/SecondVoice.md: Project vision and motivation Next Steps: - Test build with vcpkg dependencies - Test audio capture on real hardware - Validate API integrations - Tune chunk size for optimal latency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:08:03 +08:00 · 2025-11-20 03:08:03 +08:00 · 5b60acaa73
commit 5b60acaa73
parent 6248fb2322
25 changed files with 2180 additions and 0 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,5 @@
 # OpenAI API Key (for Whisper)
 OPENAI_API_KEY=sk-...
 # Anthropic API Key (for Claude)
 ANTHROPIC_API_KEY=sk-ant-...
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,45 @@
 # Build directories
 build/
 cmake-build-*/
 out/
 # vcpkg
 vcpkg_installed/
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # Compiled files
 *.o
 *.a
 *.so
 *.exe
 # Environment and secrets
 .env
 # Recordings (generated at runtime)
 recordings/*.wav
 recordings/*.mp3
 recordings/*.txt
 recordings/*.md
 recordings/*.json
 !recordings/.gitkeep
 # Logs
 *.log
 # OS
 .DS_Store
 Thumbs.db
 # CMake
 CMakeCache.txt
 CMakeFiles/
 cmake_install.cmake
 Makefile
 compile_commands.json
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -0,0 +1,67 @@
 cmake_minimum_required(VERSION 3.20)
 project(SecondVoice VERSION 0.1.0 LANGUAGES CXX)
 set(CMAKE_CXX_STANDARD 17)
 set(CMAKE_CXX_STANDARD_REQUIRED ON)
 set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
 # Find packages
 find_package(portaudio CONFIG REQUIRED)
 find_package(httplib CONFIG REQUIRED)
 find_package(nlohmann_json CONFIG REQUIRED)
 find_package(imgui CONFIG REQUIRED)
 find_package(glfw3 CONFIG REQUIRED)
 find_package(OpenGL REQUIRED)
 # Source files
 set(SOURCES
    src/main.cpp
    # Audio module
    src/audio/AudioCapture.cpp
    src/audio/AudioBuffer.cpp
    # API clients
    src/api/WhisperClient.cpp
    src/api/ClaudeClient.cpp
    # UI
    src/ui/TranslationUI.cpp
    # Utils
    src/utils/Config.cpp
    # Core
    src/core/Pipeline.cpp
 )
 # Executable
 add_executable(${PROJECT_NAME} ${SOURCES})
 # Include directories
 target_include_directories(${PROJECT_NAME} PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/src
 )
 # Link libraries
 target_link_libraries(${PROJECT_NAME} PRIVATE
    portaudio
    httplib::httplib
    nlohmann_json::nlohmann_json
    imgui::imgui
    glfw
    OpenGL::GL
 )
 # Compiler options
 if(CMAKE_CXX_COMPILER_ID MATCHES "GNU|Clang")
    target_compile_options(${PROJECT_NAME} PRIVATE
        -Wall
        -Wextra
        -Wpedantic
        -Werror
    )
 endif()
 # Copy config files to build directory
 configure_file(${CMAKE_CURRENT_SOURCE_DIR}/config.json
               ${CMAKE_CURRENT_BINARY_DIR}/config.json COPYONLY)
 # Install target
 install(TARGETS ${PROJECT_NAME} DESTINATION bin)
 install(FILES config.json DESTINATION bin)
--- a/README.md
+++ b/README.md
@ -0,0 +1,223 @@
 # SecondVoice
 Real-time Chinese to French translation system for live meetings.
 ## Overview
 SecondVoice captures audio, transcribes Chinese speech using OpenAI's Whisper API, and translates it to French using Claude AI in real-time. Perfect for understanding Chinese meetings on the fly.
 ## Features
 - 🎤 Real-time audio capture
 - 🗣️ Chinese speech-to-text (Whisper API)
 - 🌐 Chinese to French translation (Claude API)
 - 🖥️ Clean ImGui interface
 - 💾 Full recording saved to disk
 - ⚙️ Configurable chunk sizes and settings
 ## Requirements
 ### System Dependencies (Linux)
 ```bash
 # PortAudio
 sudo apt install libasound2-dev
 # OpenGL
 sudo apt install libgl1-mesa-dev libglu1-mesa-dev
 ```
 ### vcpkg
 Install vcpkg if not already installed:
 ```bash
 git clone https://github.com/microsoft/vcpkg.git
 cd vcpkg
 ./bootstrap-vcpkg.sh
 export VCPKG_ROOT=$(pwd)
 ```
 ## Setup
 1. **Clone the repository**
 ```bash
 git clone <repository-url>
 cd secondvoice
 ```
 2. **Create `.env` file** (copy from `.env.example`)
 ```bash
 cp .env.example .env
 # Edit .env and add your API keys:
 # OPENAI_API_KEY=sk-...
 # ANTHROPIC_API_KEY=sk-ant-...
 ```
 3. **Configure settings** (optional)
 Edit `config.json` to customize:
 - Audio chunk duration (default: 10s)
 - Sample rate (default: 16kHz)
 - UI window size
 - Output directory
 4. **Build the project**
 ```bash
 # Configure with vcpkg
 cmake -B build -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
 # Build
 cmake --build build -j$(nproc)
 ```
 ## Usage
 ```bash
 cd build
 ./SecondVoice
 ```
 The application will:
 1. Open an ImGui window
 2. Start capturing audio from your microphone
 3. Display Chinese transcriptions and French translations in real-time
 4. Click **STOP RECORDING** button to finish
 5. Save the full audio recording to `recordings/recording_YYYYMMDD_HHMMSS.wav`
 ## Architecture
 ```
 Audio Capture (PortAudio)
    ↓
 Whisper API (Speech-to-Text)
    ↓
 Claude API (Translation)
    ↓
 ImGui UI (Display)
 ```
 ### Threading Model
 - **Thread 1**: Audio capture (PortAudio callback)
 - **Thread 2**: AI processing (Whisper + Claude API calls)
 - **Thread 3**: UI rendering (ImGui + OpenGL)
 ## Configuration
 ### config.json
 ```json
 {
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh"
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024
  }
 }
 ```
 ### .env
 ```env
 OPENAI_API_KEY=sk-...
 ANTHROPIC_API_KEY=sk-ant-...
 ```
 ## Cost Estimation
 - **Whisper**: ~$0.006/minute (~$0.36/hour)
 - **Claude Haiku**: ~$0.03-0.05/hour
 - **Total**: ~$0.40/hour of recording
 ## Project Structure
 ```
 secondvoice/
 ├── src/
 │   ├── main.cpp                 # Entry point
 │   ├── audio/                   # Audio capture & buffer
 │   ├── api/                     # Whisper & Claude clients
 │   ├── ui/                      # ImGui interface
 │   ├── utils/                   # Config & thread-safe queue
 │   └── core/                    # Pipeline orchestration
 ├── docs/                        # Documentation
 ├── recordings/                  # Output recordings
 ├── config.json                  # Runtime configuration
 ├── .env                         # API keys (not committed)
 └── CMakeLists.txt              # Build configuration
 ```
 ## Development
 ### Building in Debug Mode
 ```bash
 cmake -B build -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
 cmake --build build
 ```
 ### Running Tests
 ```bash
 # TODO: Add tests
 ```
 ## Troubleshooting
 ### No audio capture
 - Check microphone permissions
 - Verify PortAudio is properly installed: `pa_devs` (if available)
 - Try different audio device in code
 ### API errors
 - Verify API keys in `.env` are correct
 - Check internet connection
 - Monitor API rate limits
 ### Build errors
 - Ensure vcpkg is properly set up
 - Check all system dependencies are installed
 - Try `cmake --build build --clean-first`
 ## Roadmap
 ### Phase 1 - MVP (Current)
 - ✅ Audio capture
 - ✅ Whisper integration
 - ✅ Claude integration
 - ✅ ImGui UI
 - ✅ Stop button
 ### Phase 2 - Enhancement
 - ⬜ Auto-summary post-meeting
 - ⬜ Export transcripts
 - ⬜ Search functionality
 - ⬜ Speaker diarization
 - ⬜ Replay mode
 ## License
 See LICENSE file.
 ## Contributing
 This is a personal project, but suggestions and bug reports are welcome via issues.
 ## Contact
 See docs/SecondVoice.md for project context and motivation.
--- a/build.sh
+++ b/build.sh
@ -0,0 +1,50 @@
 #!/bin/bash
 # Build script for SecondVoice
 set -e
 # Colors
 GREEN='\033[0;32m'
 RED='\033[0;31m'
 NC='\033[0m' # No Color
 echo "SecondVoice Build Script"
 echo "======================="
 echo ""
 # Check if vcpkg is set
 if [ -z "$VCPKG_ROOT" ]; then
    echo -e "${RED}Error: VCPKG_ROOT not set${NC}"
    echo "Please install vcpkg and set VCPKG_ROOT environment variable:"
    echo "  git clone https://github.com/microsoft/vcpkg.git"
    echo "  cd vcpkg && ./bootstrap-vcpkg.sh"
    echo "  export VCPKG_ROOT=\$(pwd)"
    exit 1
 fi
 echo -e "${GREEN}vcpkg found at: $VCPKG_ROOT${NC}"
 echo ""
 # Check if .env exists
 if [ ! -f ".env" ]; then
    echo -e "${RED}Warning: .env file not found${NC}"
    echo "Please create .env from .env.example and add your API keys"
    echo "  cp .env.example .env"
    echo ""
 fi
 # Configure
 echo "Configuring CMake..."
 cmake -B build -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
 # Build
 echo ""
 echo "Building..."
 cmake --build build -j$(nproc)
 echo ""
 echo -e "${GREEN}Build successful!${NC}"
 echo ""
 echo "To run the application:"
 echo "  cd build"
 echo "  ./SecondVoice"
--- a/config.json
+++ b/config.json
@ -0,0 +1,29 @@
 {
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10,
    "format": "wav"
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh",
    "temperature": 0.0
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024,
    "temperature": 0.3,
    "system_prompt": "Tu es un traducteur professionnel chinois-français. Traduis le texte suivant de manière naturelle et contextuelle."
  },
  "ui": {
    "window_width": 800,
    "window_height": 600,
    "font_size": 16,
    "max_display_lines": 50
  },
  "recording": {
    "save_audio": true,
    "output_directory": "./recordings"
  }
 }
--- a/docs/implementation_plan.md
+++ b/docs/implementation_plan.md
@ -0,0 +1,494 @@
 # SecondVoice - Plan d'Implémentation MVP
 **Date**: 20 novembre 2025
 **Target**: MVP minimal fonctionnel
 **Platform**: Linux
 **Package Manager**: vcpkg
 ---
 ## 🎯 Objectif MVP Minimal
 Application desktop qui:
 1. Capture audio microphone en continu
 2. Transcrit chinois → texte (Whisper API)
 3. Traduit texte → français (Claude API)
 4. Affiche traduction temps réel (ImGui)
 5. Bouton Stop pour arrêter (pas de résumé MVP)
 ---
 ## 🏗️ Architecture Technique
 ### Pipeline
 ```
 Audio Capture (PortAudio)
    ↓ (chunks audio configurables)
 Whisper API (STT)
    ↓ (texte chinois)
 Claude API (traduction)
    ↓ (texte français)
 ImGui UI (display temps réel + bouton Stop)
 ```
 ### Threading Model
 ```
 Thread 1 - Audio Capture:
  - PortAudio callback capture audio
  - Accumule chunks (taille configurable)
  - Push dans queue thread-safe
  - Save WAV backup en background
 Thread 2 - AI Processing:
  - Pop chunk depuis audio queue
  - POST Whisper API → transcription chinoise
  - POST Claude API → traduction française
  - Push résultat dans UI queue
 Thread 3 - Main UI (ImGui):
  - Render window ImGui
  - Display traductions depuis queue
  - Handle bouton Stop
  - Update status/duration
 ```
 ---
 ## 📁 Structure Projet
 ```
 secondvoice/
 ├── .env                            # API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY)
 ├── .gitignore
 ├── CMakeLists.txt                  # Build configuration
 ├── vcpkg.json                      # Dependencies manifest
 ├── config.json                     # Runtime config (audio chunk size, etc)
 ├── README.md
 ├── docs/
 │   ├── SecondVoice.md             # Vision document
 │   └── implementation_plan.md      # Ce document
 ├── src/
 │   ├── main.cpp                    # Entry point + ImGui main loop
 │   ├── audio/
 │   │   ├── AudioCapture.h
 │   │   ├── AudioCapture.cpp        # PortAudio wrapper
 │   │   ├── AudioBuffer.h
 │   │   └── AudioBuffer.cpp         # Thread-safe ring buffer
 │   ├── api/
 │   │   ├── WhisperClient.h
 │   │   ├── WhisperClient.cpp       # Whisper API client
 │   │   ├── ClaudeClient.h
 │   │   └── ClaudeClient.cpp        # Claude API client
 │   ├── ui/
 │   │   ├── TranslationUI.h
 │   │   └── TranslationUI.cpp       # ImGui interface
 │   ├── utils/
 │   │   ├── Config.h
 │   │   ├── Config.cpp              # Load .env + config.json
 │   │   ├── ThreadSafeQueue.h       # Template queue thread-safe
 │   │   └── Logger.h                # Simple logging
 │   └── core/
 │       ├── Pipeline.h
 │       └── Pipeline.cpp            # Orchestrate threads
 ├── recordings/                     # Output audio files
 │   └── .gitkeep
 └── build/                          # CMake build output (ignored)
 ```
 ---
 ## 🔧 Dépendances
 ### vcpkg.json
 ```json
 {
  "name": "secondvoice",
  "version": "0.1.0",
  "dependencies": [
    "portaudio",
    "cpp-httplib",
    "nlohmann-json",
    "imgui[glfw-binding,opengl3-binding]",
    "glfw3",
    "opengl"
  ]
 }
 ```
 ### System Requirements (Linux)
 ```bash
 # PortAudio dependencies
 sudo apt install libasound2-dev
 # OpenGL dependencies
 sudo apt install libgl1-mesa-dev libglu1-mesa-dev
 ```
 ---
 ## ⚙️ Configuration
 ### .env (racine projet)
 ```env
 OPENAI_API_KEY=sk-...
 ANTHROPIC_API_KEY=sk-ant-...
 ```
 ### config.json (racine projet)
 ```json
 {
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10,
    "format": "wav"
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh",
    "temperature": 0.0
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024,
    "temperature": 0.3,
    "system_prompt": "Tu es un traducteur professionnel chinois-français. Traduis le texte suivant de manière naturelle et contextuelle."
  },
  "ui": {
    "window_width": 800,
    "window_height": 600,
    "font_size": 16,
    "max_display_lines": 50
  },
  "recording": {
    "save_audio": true,
    "output_directory": "./recordings"
  }
 }
 ```
 ---
 ## 🔌 API Clients
 ### Whisper API
 ```cpp
 // POST https://api.openai.com/v1/audio/transcriptions
 // Content-Type: multipart/form-data
 Request:
 - file: audio.wav (binary)
 - model: whisper-1
 - language: zh
 - temperature: 0.0
 Response:
 {
  "text": "你好，今天我们讨论项目进度..."
 }
 ```
 ### Claude API
 ```cpp
 // POST https://api.anthropic.com/v1/messages
 // Content-Type: application/json
 // x-api-key: {ANTHROPIC_API_KEY}
 // anthropic-version: 2023-06-01
 Request:
 {
  "model": "claude-haiku-4-20250514",
  "max_tokens": 1024,
  "messages": [{
    "role": "user",
    "content": "Traduis en français: 你好，今天我们讨论项目进度..."
  }]
 }
 Response:
 {
  "content": [{
    "type": "text",
    "text": "Bonjour, aujourd'hui nous discutons de l'avancement du projet..."
  }],
  "model": "claude-haiku-4-20250514",
  "usage": {...}
 }
 ```
 ---
 ## 🎨 Interface ImGui
 ### Layout Minimaliste
 ```
 ┌────────────────────────────────────────────┐
 │ SecondVoice - Live Translation             │
 ├────────────────────────────────────────────┤
 │                                            │
 │ [●] Recording...    Duration: 00:05:23     │
 │                                            │
 │ ┌────────────────────────────────────────┐ │
 │ │ 中文: 你好，今天我们讨论项目进度...    │ │
 │ │ FR: Bonjour, aujourd'hui nous          │ │
 │ │     discutons de l'avancement...       │ │
 │ │                                        │ │
 │ │ 中文: 关于预算的问题...                │ │
 │ │ FR: Concernant la question du budget.. │ │
 │ │                                        │ │
 │ │ [Auto-scroll enabled]                  │ │
 │ │                                        │ │
 │ └────────────────────────────────────────┘ │
 │                                            │
 │         [    STOP RECORDING    ]           │
 │                                            │
 │ Status: Processing chunk 12/12             │
 │ Audio: 16kHz mono, chunk size: 10s         │
 └────────────────────────────────────────────┘
 ```
 ### Features UI
 - **Scrollable text area**: Auto-scroll, peut désactiver pour review
 - **Color coding**: Chinois (couleur 1), Français (couleur 2)
 - **Status bar**: Duration, chunk count, processing status
 - **Stop button**: Arrête capture + processing, sauvegarde audio
 - **Window resizable**: Layout adaptatif
 ---
 ## 🚀 Ordre d'Implémentation
 ### Phase 1 - Setup Infrastructure (Jour 1)
 **Todo**:
 1. ✅ Créer structure projet
 2. ✅ Setup CMakeLists.txt avec vcpkg
 3. ✅ Créer .gitignore (.env, build/, recordings/)
 4. ✅ Créer config.json template
 5. ✅ Setup .env (API keys)
 6. ✅ Test build minimal (hello world)
 **Validation**: `cmake -B build && cmake --build build` compile sans erreurs
 ---
 ### Phase 2 - Audio Capture (Jour 1-2)
 **Todo**:
 1. Implémenter `AudioCapture.h/cpp`:
   - Init PortAudio
   - Callback capture audio
   - Accumulation chunks (configurable duration)
   - Push dans ThreadSafeQueue
 2. Implémenter `AudioBuffer.h/cpp`:
   - Ring buffer pour audio raw
   - Thread-safe operations
 3. Test standalone: Capture 30s audio → save WAV
 **Validation**: Audio WAV lisible, durée correcte, qualité OK
 ---
 ### Phase 3 - Whisper Client (Jour 2)
 **Todo**:
 1. Implémenter `WhisperClient.h/cpp`:
   - Load API key depuis .env
   - POST multipart/form-data (cpp-httplib)
   - Encode audio WAV en memory
   - Parse JSON response
   - Error handling (retry, timeout)
 2. Test standalone: Audio file → Whisper → texte chinois
 **Validation**: Transcription chinoise correcte sur sample audio
 ---
 ### Phase 4 - Claude Client (Jour 2-3)
 **Todo**:
 1. Implémenter `ClaudeClient.h/cpp`:
   - Load API key depuis .env
   - POST JSON request (cpp-httplib)
   - System prompt configurable
   - Parse response (extract text)
   - Error handling
 2. Test standalone: Texte chinois → Claude → texte français
 **Validation**: Traduction française naturelle et correcte
 ---
 ### Phase 5 - ImGui UI (Jour 3)
 **Todo**:
 1. Setup ImGui + GLFW + OpenGL:
   - Window creation
   - Render loop
   - Input handling
 2. Implémenter `TranslationUI.h/cpp`:
   - Scrollable text area
   - Display messages (CN + FR)
   - Button Stop
   - Status bar (duration, chunk count)
 3. Test standalone: Afficher mock data
 **Validation**: UI responsive, affichage texte OK, bouton fonctionne
 ---
 ### Phase 6 - Pipeline Integration (Jour 4)
 **Todo**:
 1. Implémenter `Pipeline.h/cpp`:
   - Thread 1: AudioCapture loop
   - Thread 2: Processing loop (Whisper → Claude)
   - Thread 3: UI loop (ImGui)
   - ThreadSafeQueue entre threads
   - Synchronisation (start/stop)
 2. Implémenter `Config.h/cpp`:
   - Load .env (API keys)
   - Load config.json (settings)
 3. Implémenter `main.cpp`:
   - Init all components
   - Start pipeline
   - Handle graceful shutdown
 **Validation**: Pipeline complet fonctionne bout-à-bout
 ---
 ### Phase 7 - Testing & Tuning (Jour 5)
 **Todo**:
 1. Test avec audio réel chinois:
   - Sample conversations
   - Different audio qualities
   - Different chunk sizes (5s, 10s, 30s)
 2. Measure latence:
   - Audio → Whisper: X secondes
   - Whisper → Claude: Y secondes
   - Total: Z secondes
 3. Debug & fix bugs:
   - Memory leaks
   - Thread safety issues
   - API errors handling
 4. Optimize:
   - Chunk size optimal (tradeoff latency vs accuracy)
   - API timeout values
   - UI refresh rate
 **Validation**:
 - Latence totale < 10s acceptable
 - Pas de crash sur 30min recording
 - Transcription + traduction compréhensibles
 ---
 ## 🧪 Test Plan
 ### Unit Tests (Phase 2+)
 - `AudioCapture`: Capture audio, format correct
 - `WhisperClient`: API call mock, parsing JSON
 - `ClaudeClient`: API call mock, parsing JSON
 - `ThreadSafeQueue`: Thread safety, no data loss
 ### Integration Tests
 - Audio → Whisper: Audio file → texte chinois correct
 - Whisper → Claude: Texte chinois → traduction française correcte
 - Pipeline: Audio → UI display complet
 ### End-to-End Test
 - Recording 5min conversation chinoise réelle
 - Vérifier transcription accuracy (>85%)
 - Vérifier traduction compréhensible
 - Vérifier UI responsive
 - Vérifier audio sauvegardé correctement
 ---
 ## 📊 Metrics à Tracker
 ### Performance
 - **Latence Whisper**: Temps API call (target: <3s pour 10s audio)
 - **Latence Claude**: Temps API call (target: <2s pour 200 tokens)
 - **Latence totale**: Audio → Display (target: <10s)
 - **Memory usage**: Stable sur longue durée (no leaks)
 - **CPU usage**: Acceptable (<50% sur laptop)
 ### Qualité
 - **Whisper accuracy**: % mots corrects (target: >85%)
 - **Claude quality**: Traduction naturelle (subjective)
 - **Crash rate**: 0 crash sur 1h recording
 ### Cost
 - **Whisper**: $0.006/min audio
 - **Claude**: ~$0.03-0.05/h (depends on text volume)
 - **Total**: ~$0.40/h meeting
 ---
 ## ⚠️ Risks & Mitigations
 | Risk | Impact | Mitigation |
 |------|--------|------------|
 | **Whisper API timeout** | Bloquant | Retry logic, timeout 30s, fallback queue |
 | **Claude API rate limit** | Moyen | Exponential backoff, queue requests |
 | **Audio buffer overflow** | Moyen | Ring buffer size adequate, drop old chunks if needed |
 | **Thread deadlock** | Bloquant | Use std::lock_guard, avoid nested locks |
 | **Memory leak** | Moyen | Use smart pointers, valgrind tests |
 | **Network interruption** | Moyen | Retry logic, cache audio locally |
 ---
 ## 🎯 Success Criteria MVP
 ✅ **MVP validé si**:
 1. Capture audio microphone fonctionne
 2. Transcription chinoise >85% précise
 3. Traduction française compréhensible
 4. UI affiche traductions temps réel
 5. Bouton Stop arrête proprement
 6. Audio sauvegardé correctement
 7. Pas de crash sur 30min recording
 8. Latence totale <10s acceptable
 ---
 ## 📝 Notes Implémentation
 ### Thread Safety
 - Utiliser `std::mutex` + `std::lock_guard` pour queues
 - Pas de shared state sans protection
 - Use `std::atomic<bool>` pour flags (running, stopping)
 ### Error Handling
 - Try/catch sur API calls
 - Log errors (spdlog ou simple cout)
 - Retry logic (max 3 attempts)
 - Graceful degradation (skip chunk si error persistant)
 ### Audio Format
 - **Sample rate**: 16kHz (optimal pour Whisper)
 - **Channels**: Mono (sufficient, réduit bandwidth)
 - **Format**: 16-bit PCM WAV
 - **Chunk size**: Configurable (default 10s)
 ### API Best Practices
 - **Timeout**: 30s pour Whisper, 15s pour Claude
 - **Retry**: Exponential backoff (1s, 2s, 4s)
 - **Rate limiting**: Respect API limits (monitor 429 errors)
 - **Headers**: Always set User-Agent, API version
 ---
 ## 🔄 Post-MVP (Phase 2)
 **Not included in MVP, but planned**:
 - ❌ Résumé auto post-meeting (Claude summary)
 - ❌ Export structuré (transcripts + audio)
 - ❌ Système de recherche (backlog)
 - ❌ Diarization (qui parle)
 - ❌ Replay mode
 - ❌ GUI élaborée (settings, etc)
 **Focus MVP**: Pipeline fonctionnel bout-à-bout, validation concept, usage réel premier meeting.
 ---
 *Document créé: 20 novembre 2025*
 *Status: Ready to implement*
 *Estimated effort: 5 jours développement + 2 jours tests*
--- a/recordings/.gitkeep
+++ b/recordings/.gitkeep
--- a/src/api/ClaudeClient.cpp
+++ b/src/api/ClaudeClient.cpp
@ -0,0 +1,79 @@
 #include "ClaudeClient.h"
 #include <httplib.h>
 #include <nlohmann/json.hpp>
 #include <iostream>
 using json = nlohmann::json;
 namespace secondvoice {
 ClaudeClient::ClaudeClient(const std::string& api_key)
    : api_key_(api_key) {
 }
 std::optional<ClaudeResponse> ClaudeClient::translate(
    const std::string& chinese_text,
    const std::string& system_prompt,
    int max_tokens,
    float temperature) {
    // Build request JSON
    json request_json = {
        {"model", MODEL},
        {"max_tokens", max_tokens},
        {"temperature", temperature},
        {"messages", json::array({
            {
                {"role", "user"},
                {"content", "Traduis en français: " + chinese_text}
            }
        })}
    };
    if (!system_prompt.empty()) {
        request_json["system"] = system_prompt;
    }
    std::string request_body = request_json.dump();
    // Make HTTP request
    httplib::Client client("https://api.anthropic.com");
    client.set_read_timeout(15, 0);  // 15 seconds timeout
    httplib::Headers headers = {
        {"x-api-key", api_key_},
        {"anthropic-version", API_VERSION},
        {"content-type", "application/json"}
    };
    auto res = client.Post("/v1/messages", headers, request_body, "application/json");
    if (!res) {
        std::cerr << "Claude API request failed: " << httplib::to_string(res.error()) << std::endl;
        return std::nullopt;
    }
    if (res->status != 200) {
        std::cerr << "Claude API error " << res->status << ": " << res->body << std::endl;
        return std::nullopt;
    }
    // Parse response
    try {
        json response_json = json::parse(res->body);
        if (!response_json.contains("content") || !response_json["content"].is_array()) {
            std::cerr << "Invalid Claude API response format" << std::endl;
            return std::nullopt;
        }
        ClaudeResponse response;
        response.text = response_json["content"][0]["text"].get<std::string>();
        return response;
    } catch (const json::exception& e) {
        std::cerr << "Failed to parse Claude response: " << e.what() << std::endl;
        return std::nullopt;
    }
 }
 } // namespace secondvoice
--- a/src/api/ClaudeClient.h
+++ b/src/api/ClaudeClient.h
@ -0,0 +1,30 @@
 #pragma once
 #include <string>
 #include <optional>
 namespace secondvoice {
 struct ClaudeResponse {
    std::string text;
 };
 class ClaudeClient {
 public:
    ClaudeClient(const std::string& api_key);
    std::optional<ClaudeResponse> translate(
        const std::string& chinese_text,
        const std::string& system_prompt = "",
        int max_tokens = 1024,
        float temperature = 0.3f
    );
 private:
    std::string api_key_;
    static constexpr const char* API_URL = "https://api.anthropic.com/v1/messages";
    static constexpr const char* MODEL = "claude-haiku-4-20250514";
    static constexpr const char* API_VERSION = "2023-06-01";
 };
 } // namespace secondvoice
--- a/src/api/WhisperClient.cpp
+++ b/src/api/WhisperClient.cpp
@ -0,0 +1,85 @@
 #include "WhisperClient.h"
 #include "../audio/AudioBuffer.h"
 #include <httplib.h>
 #include <nlohmann/json.hpp>
 #include <iostream>
 #include <sstream>
 #include <fstream>
 using json = nlohmann::json;
 namespace secondvoice {
 WhisperClient::WhisperClient(const std::string& api_key)
    : api_key_(api_key) {
 }
 std::optional<WhisperResponse> WhisperClient::transcribe(
    const std::vector<float>& audio_data,
    int sample_rate,
    int channels,
    const std::string& language,
    float temperature) {
    // Save audio to temporary WAV file
    AudioBuffer buffer(sample_rate, channels);
    buffer.addSamples(audio_data);
    std::string temp_file = "/tmp/secondvoice_temp.wav";
    if (!buffer.saveToWav(temp_file)) {
        std::cerr << "Failed to save temporary WAV file" << std::endl;
        return std::nullopt;
    }
    // Read WAV file
    std::ifstream file(temp_file, std::ios::binary);
    if (!file.is_open()) {
        std::cerr << "Failed to open temporary WAV file" << std::endl;
        return std::nullopt;
    }
    std::ostringstream wav_stream;
    wav_stream << file.rdbuf();
    std::string wav_data = wav_stream.str();
    file.close();
    // Make HTTP request
    httplib::Client client("https://api.openai.com");
    client.set_read_timeout(30, 0);  // 30 seconds timeout
    httplib::MultipartFormDataItems items = {
        {"file", wav_data, "audio.wav", "audio/wav"},
        {"model", "whisper-1", "", ""},
        {"language", language, "", ""},
        {"temperature", std::to_string(temperature), "", ""}
    };
    httplib::Headers headers = {
        {"Authorization", "Bearer " + api_key_}
    };
    auto res = client.Post("/v1/audio/transcriptions", headers, items);
    if (!res) {
        std::cerr << "Whisper API request failed: " << httplib::to_string(res.error()) << std::endl;
        return std::nullopt;
    }
    if (res->status != 200) {
        std::cerr << "Whisper API error " << res->status << ": " << res->body << std::endl;
        return std::nullopt;
    }
    // Parse response
    try {
        json response_json = json::parse(res->body);
        WhisperResponse response;
        response.text = response_json["text"].get<std::string>();
        return response;
    } catch (const json::exception& e) {
        std::cerr << "Failed to parse Whisper response: " << e.what() << std::endl;
        return std::nullopt;
    }
 }
 } // namespace secondvoice
--- a/src/api/WhisperClient.h
+++ b/src/api/WhisperClient.h
@ -0,0 +1,30 @@
 #pragma once
 #include <string>
 #include <vector>
 #include <optional>
 namespace secondvoice {
 struct WhisperResponse {
    std::string text;
 };
 class WhisperClient {
 public:
    WhisperClient(const std::string& api_key);
    std::optional<WhisperResponse> transcribe(
        const std::vector<float>& audio_data,
        int sample_rate,
        int channels,
        const std::string& language = "zh",
        float temperature = 0.0f
    );
 private:
    std::string api_key_;
    static constexpr const char* API_URL = "https://api.openai.com/v1/audio/transcriptions";
 };
 } // namespace secondvoice
--- a/src/audio/AudioBuffer.cpp
+++ b/src/audio/AudioBuffer.cpp
@ -0,0 +1,76 @@
 #include "AudioBuffer.h"
 #include <fstream>
 #include <cstring>
 #include <algorithm>
 namespace secondvoice {
 AudioBuffer::AudioBuffer(int sample_rate, int channels)
    : sample_rate_(sample_rate)
    , channels_(channels) {
 }
 void AudioBuffer::addSamples(const std::vector<float>& samples) {
    samples_.insert(samples_.end(), samples.begin(), samples.end());
 }
 std::vector<float> AudioBuffer::getSamples() const {
    return samples_;
 }
 void AudioBuffer::clear() {
    samples_.clear();
 }
 bool AudioBuffer::saveToWav(const std::string& filename) const {
    std::ofstream file(filename, std::ios::binary);
    if (!file.is_open()) {
        return false;
    }
    // WAV header
    struct WavHeader {
        char riff[4] = {'R', 'I', 'F', 'F'};
        uint32_t file_size;
        char wave[4] = {'W', 'A', 'V', 'E'};
        char fmt[4] = {'f', 'm', 't', ' '};
        uint32_t fmt_size = 16;
        uint16_t audio_format = 1;  // PCM
        uint16_t num_channels;
        uint32_t sample_rate;
        uint32_t byte_rate;
        uint16_t block_align;
        uint16_t bits_per_sample = 16;
        char data[4] = {'d', 'a', 't', 'a'};
        uint32_t data_size;
    };
    WavHeader header;
    header.num_channels = channels_;
    header.sample_rate = sample_rate_;
    header.byte_rate = sample_rate_ * channels_ * 2;  // 16-bit = 2 bytes
    header.block_align = channels_ * 2;
    // Convert float samples to 16-bit PCM
    std::vector<int16_t> pcm_samples;
    pcm_samples.reserve(samples_.size());
    for (float sample : samples_) {
        // Clamp to [-1.0, 1.0] and convert to 16-bit
        float clamped = std::max(-1.0f, std::min(1.0f, sample));
        pcm_samples.push_back(static_cast<int16_t>(clamped * 32767.0f));
    }
    header.data_size = pcm_samples.size() * sizeof(int16_t);
    header.file_size = sizeof(WavHeader) - 8 + header.data_size;
    // Write header
    file.write(reinterpret_cast<const char*>(&header), sizeof(header));
    // Write PCM data
    file.write(reinterpret_cast<const char*>(pcm_samples.data()), header.data_size);
    file.close();
    return true;
 }
 } // namespace secondvoice
--- a/src/audio/AudioBuffer.h
+++ b/src/audio/AudioBuffer.h
@ -0,0 +1,27 @@
 #pragma once
 #include <vector>
 #include <string>
 namespace secondvoice {
 class AudioBuffer {
 public:
    AudioBuffer(int sample_rate, int channels);
    void addSamples(const std::vector<float>& samples);
    std::vector<float> getSamples() const;
    void clear();
    bool saveToWav(const std::string& filename) const;
    size_t size() const { return samples_.size(); }
    bool empty() const { return samples_.empty(); }
 private:
    int sample_rate_;
    int channels_;
    std::vector<float> samples_;
 };
 } // namespace secondvoice
--- a/src/audio/AudioCapture.cpp
+++ b/src/audio/AudioCapture.cpp
@ -0,0 +1,110 @@
 #include "AudioCapture.h"
 #include <iostream>
 namespace secondvoice {
 AudioCapture::AudioCapture(int sample_rate, int channels, int chunk_duration_seconds)
    : sample_rate_(sample_rate)
    , channels_(channels)
    , chunk_duration_seconds_(chunk_duration_seconds) {
 }
 AudioCapture::~AudioCapture() {
    stop();
    if (stream_) {
        Pa_CloseStream(stream_);
    }
    Pa_Terminate();
 }
 bool AudioCapture::initialize() {
    PaError err = Pa_Initialize();
    if (err != paNoError) {
        std::cerr << "PortAudio init error: " << Pa_GetErrorText(err) << std::endl;
        return false;
    }
    return true;
 }
 int AudioCapture::audioCallback(const void* input, void* output,
                                unsigned long frame_count,
                                const PaStreamCallbackTimeInfo* time_info,
                                PaStreamCallbackFlags status_flags,
                                void* user_data) {
    AudioCapture* self = static_cast<AudioCapture*>(user_data);
    const float* in = static_cast<const float*>(input);
    // Accumulate audio data
    for (unsigned long i = 0; i < frame_count * self->channels_; ++i) {
        self->buffer_.push_back(in[i]);
    }
    // Check if we have accumulated enough data for a chunk
    size_t chunk_samples = self->sample_rate_ * self->channels_ * self->chunk_duration_seconds_;
    if (self->buffer_.size() >= chunk_samples) {
        // Call the callback with the chunk
        if (self->callback_) {
            self->callback_(self->buffer_);
        }
        self->buffer_.clear();
    }
    return paContinue;
 }
 bool AudioCapture::start(AudioCallback callback) {
    if (is_recording_) {
        return false;
    }
    callback_ = callback;
    buffer_.clear();
    PaStreamParameters input_params;
    input_params.device = Pa_GetDefaultInputDevice();
    if (input_params.device == paNoDevice) {
        std::cerr << "No default input device" << std::endl;
        return false;
    }
    input_params.channelCount = channels_;
    input_params.sampleFormat = paFloat32;
    input_params.suggestedLatency = Pa_GetDeviceInfo(input_params.device)->defaultLowInputLatency;
    input_params.hostApiSpecificStreamInfo = nullptr;
    PaError err = Pa_OpenStream(
        &stream_,
        &input_params,
        nullptr,  // no output
        sample_rate_,
        paFramesPerBufferUnspecified,
        paClipOff,
        &AudioCapture::audioCallback,
        this
    );
    if (err != paNoError) {
        std::cerr << "PortAudio open stream error: " << Pa_GetErrorText(err) << std::endl;
        return false;
    }
    err = Pa_StartStream(stream_);
    if (err != paNoError) {
        std::cerr << "PortAudio start stream error: " << Pa_GetErrorText(err) << std::endl;
        return false;
    }
    is_recording_ = true;
    return true;
 }
 void AudioCapture::stop() {
    if (!is_recording_ || !stream_) {
        return;
    }
    Pa_StopStream(stream_);
    is_recording_ = false;
 }
 } // namespace secondvoice
--- a/src/audio/AudioCapture.h
+++ b/src/audio/AudioCapture.h
@ -0,0 +1,39 @@
 #pragma once
 #include <vector>
 #include <string>
 #include <functional>
 #include <portaudio.h>
 namespace secondvoice {
 class AudioCapture {
 public:
    using AudioCallback = std::function<void(const std::vector<float>&)>;
    AudioCapture(int sample_rate, int channels, int chunk_duration_seconds);
    ~AudioCapture();
    bool initialize();
    bool start(AudioCallback callback);
    void stop();
    bool isRecording() const { return is_recording_; }
 private:
    static int audioCallback(const void* input, void* output,
                            unsigned long frame_count,
                            const PaStreamCallbackTimeInfo* time_info,
                            PaStreamCallbackFlags status_flags,
                            void* user_data);
    int sample_rate_;
    int channels_;
    int chunk_duration_seconds_;
    bool is_recording_ = false;
    PaStream* stream_ = nullptr;
    AudioCallback callback_;
    std::vector<float> buffer_;
 };
 } // namespace secondvoice
--- a/src/core/Pipeline.cpp
+++ b/src/core/Pipeline.cpp
@ -0,0 +1,214 @@
 #include "Pipeline.h"
 #include "../audio/AudioCapture.h"
 #include "../audio/AudioBuffer.h"
 #include "../api/WhisperClient.h"
 #include "../api/ClaudeClient.h"
 #include "../ui/TranslationUI.h"
 #include "../utils/Config.h"
 #include <iostream>
 #include <chrono>
 #include <filesystem>
 namespace secondvoice {
 Pipeline::Pipeline() = default;
 Pipeline::~Pipeline() {
    stop();
 }
 bool Pipeline::initialize() {
    auto& config = Config::getInstance();
    // Initialize audio capture
    audio_capture_ = std::make_unique<AudioCapture>(
        config.getAudioConfig().sample_rate,
        config.getAudioConfig().channels,
        config.getAudioConfig().chunk_duration_seconds
    );
    if (!audio_capture_->initialize()) {
        std::cerr << "Failed to initialize audio capture" << std::endl;
        return false;
    }
    // Initialize API clients
    whisper_client_ = std::make_unique<WhisperClient>(config.getOpenAIKey());
    claude_client_ = std::make_unique<ClaudeClient>(config.getAnthropicKey());
    // Initialize UI
    ui_ = std::make_unique<TranslationUI>(
        config.getUIConfig().window_width,
        config.getUIConfig().window_height
    );
    if (!ui_->initialize()) {
        std::cerr << "Failed to initialize UI" << std::endl;
        return false;
    }
    // Initialize full recording buffer
    full_recording_ = std::make_unique<AudioBuffer>(
        config.getAudioConfig().sample_rate,
        config.getAudioConfig().channels
    );
    // Create recordings directory if it doesn't exist
    std::filesystem::create_directories(config.getRecordingConfig().output_directory);
    return true;
 }
 bool Pipeline::start() {
    if (running_) {
        return false;
    }
    running_ = true;
    // Start threads
    audio_thread_ = std::thread(&Pipeline::audioThread, this);
    processing_thread_ = std::thread(&Pipeline::processingThread, this);
    ui_thread_ = std::thread(&Pipeline::uiThread, this);
    return true;
 }
 void Pipeline::stop() {
    if (!running_) {
        return;
    }
    running_ = false;
    // Stop audio capture
    if (audio_capture_) {
        audio_capture_->stop();
    }
    // Shutdown queues
    audio_queue_.shutdown();
    transcription_queue_.shutdown();
    // Wait for threads
    if (audio_thread_.joinable()) {
        audio_thread_.join();
    }
    if (processing_thread_.joinable()) {
        processing_thread_.join();
    }
    if (ui_thread_.joinable()) {
        ui_thread_.join();
    }
    // Save full recording
    auto& config = Config::getInstance();
    if (config.getRecordingConfig().save_audio && full_recording_) {
        auto now = std::chrono::system_clock::now();
        auto time_t = std::chrono::system_clock::to_time_t(now);
        std::stringstream ss;
        ss << config.getRecordingConfig().output_directory << "/"
           << "recording_" << std::put_time(std::localtime(&time_t), "%Y%m%d_%H%M%S")
           << ".wav";
        if (full_recording_->saveToWav(ss.str())) {
            std::cout << "Recording saved to: " << ss.str() << std::endl;
        } else {
            std::cerr << "Failed to save recording" << std::endl;
        }
    }
 }
 void Pipeline::audioThread() {
    auto& config = Config::getInstance();
    audio_capture_->start([this, &config](const std::vector<float>& audio_data) {
        if (!running_) return;
        // Add to full recording
        full_recording_->addSamples(audio_data);
        // Push to processing queue
        AudioChunk chunk;
        chunk.data = audio_data;
        chunk.sample_rate = config.getAudioConfig().sample_rate;
        chunk.channels = config.getAudioConfig().channels;
        audio_queue_.push(std::move(chunk));
    });
    // Keep thread alive while recording
    auto start_time = std::chrono::steady_clock::now();
    while (running_ && audio_capture_->isRecording()) {
        std::this_thread::sleep_for(std::chrono::seconds(1));
        // Update duration
        auto now = std::chrono::steady_clock::now();
        recording_duration_ = std::chrono::duration_cast<std::chrono::seconds>(now - start_time).count();
    }
 }
 void Pipeline::processingThread() {
    auto& config = Config::getInstance();
    while (running_) {
        auto chunk_opt = audio_queue_.wait_and_pop();
        if (!chunk_opt.has_value()) {
            break;  // Queue shutdown
        }
        auto& chunk = chunk_opt.value();
        // Transcribe with Whisper
        auto whisper_result = whisper_client_->transcribe(
            chunk.data,
            chunk.sample_rate,
            chunk.channels,
            config.getWhisperConfig().language,
            config.getWhisperConfig().temperature
        );
        if (!whisper_result.has_value()) {
            std::cerr << "Whisper transcription failed" << std::endl;
            continue;
        }
        // Translate with Claude
        auto claude_result = claude_client_->translate(
            whisper_result->text,
            config.getClaudeConfig().system_prompt,
            config.getClaudeConfig().max_tokens,
            config.getClaudeConfig().temperature
        );
        if (!claude_result.has_value()) {
            std::cerr << "Claude translation failed" << std::endl;
            continue;
        }
        // Add to UI
        ui_->addTranslation(whisper_result->text, claude_result->text);
        std::cout << "CN: " << whisper_result->text << std::endl;
        std::cout << "FR: " << claude_result->text << std::endl;
        std::cout << "---" << std::endl;
    }
 }
 void Pipeline::uiThread() {
    while (running_ && !ui_->shouldClose()) {
        ui_->setRecordingDuration(recording_duration_);
        ui_->setProcessingStatus("Processing...");
        ui_->render();
        // Check if stop was requested
        if (ui_->isStopRequested()) {
            running_ = false;
            break;
        }
        std::this_thread::sleep_for(std::chrono::milliseconds(16));  // ~60 FPS
    }
 }
 } // namespace secondvoice
--- a/src/core/Pipeline.h
+++ b/src/core/Pipeline.h
@ -0,0 +1,59 @@
 #pragma once
 #include <memory>
 #include <thread>
 #include <atomic>
 #include "../utils/ThreadSafeQueue.h"
 namespace secondvoice {
 class AudioCapture;
 class WhisperClient;
 class ClaudeClient;
 class TranslationUI;
 class AudioBuffer;
 struct AudioChunk {
    std::vector<float> data;
    int sample_rate;
    int channels;
 };
 struct TranscriptionResult {
    std::string chinese_text;
 };
 class Pipeline {
 public:
    Pipeline();
    ~Pipeline();
    bool initialize();
    bool start();
    void stop();
    bool isRunning() const { return running_; }
 private:
    void audioThread();
    void processingThread();
    void uiThread();
    std::unique_ptr<AudioCapture> audio_capture_;
    std::unique_ptr<WhisperClient> whisper_client_;
    std::unique_ptr<ClaudeClient> claude_client_;
    std::unique_ptr<TranslationUI> ui_;
    std::unique_ptr<AudioBuffer> full_recording_;
    ThreadSafeQueue<AudioChunk> audio_queue_;
    ThreadSafeQueue<TranscriptionResult> transcription_queue_;
    std::thread audio_thread_;
    std::thread processing_thread_;
    std::thread ui_thread_;
    std::atomic<bool> running_{false};
    std::atomic<int> recording_duration_{0};
 };
 } // namespace secondvoice
--- a/src/main.cpp
+++ b/src/main.cpp
@ -0,0 +1,56 @@
 #include <iostream>
 #include "utils/Config.h"
 #include "core/Pipeline.h"
 int main(int argc, char** argv) {
    std::cout << "SecondVoice - Real-time Translation System" << std::endl;
    std::cout << "===========================================" << std::endl;
    // Load configuration
    secondvoice::Config& config = secondvoice::Config::getInstance();
    if (!config.load("config.json", ".env")) {
        std::cerr << "Failed to load configuration" << std::endl;
        return 1;
    }
    std::cout << "Configuration loaded successfully" << std::endl;
    std::cout << "Audio: " << config.getAudioConfig().sample_rate << "Hz, "
              << config.getAudioConfig().channels << " channel(s), "
              << config.getAudioConfig().chunk_duration_seconds << "s chunks" << std::endl;
    std::cout << "Whisper: " << config.getWhisperConfig().model
              << " (language: " << config.getWhisperConfig().language << ")" << std::endl;
    std::cout << "Claude: " << config.getClaudeConfig().model << std::endl;
    std::cout << std::endl;
    // Create and initialize pipeline
    secondvoice::Pipeline pipeline;
    if (!pipeline.initialize()) {
        std::cerr << "Failed to initialize pipeline" << std::endl;
        return 1;
    }
    std::cout << "Pipeline initialized successfully" << std::endl;
    std::cout << "Starting recording and translation..." << std::endl;
    std::cout << std::endl;
    // Start pipeline
    if (!pipeline.start()) {
        std::cerr << "Failed to start pipeline" << std::endl;
        return 1;
    }
    // Wait for pipeline to finish (user clicks Stop button)
    while (pipeline.isRunning()) {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    std::cout << std::endl;
    std::cout << "Recording stopped" << std::endl;
    std::cout << "Saving audio..." << std::endl;
    pipeline.stop();
    std::cout << "Done!" << std::endl;
    return 0;
 }
--- a/src/ui/TranslationUI.cpp
+++ b/src/ui/TranslationUI.cpp
@ -0,0 +1,160 @@
 #include "TranslationUI.h"
 #include <imgui.h>
 #include <imgui_impl_glfw.h>
 #include <imgui_impl_opengl3.h>
 #include <iostream>
 namespace secondvoice {
 TranslationUI::TranslationUI(int width, int height)
    : width_(width)
    , height_(height) {
 }
 TranslationUI::~TranslationUI() {
    if (window_) {
        ImGui_ImplOpenGL3_Shutdown();
        ImGui_ImplGlfw_Shutdown();
        ImGui::DestroyContext();
        glfwDestroyWindow(window_);
        glfwTerminate();
    }
 }
 bool TranslationUI::initialize() {
    // Initialize GLFW
    if (!glfwInit()) {
        std::cerr << "Failed to initialize GLFW" << std::endl;
        return false;
    }
    // OpenGL 3.3 + GLSL 330
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    // Create window
    window_ = glfwCreateWindow(width_, height_, "SecondVoice - Live Translation", nullptr, nullptr);
    if (!window_) {
        std::cerr << "Failed to create GLFW window" << std::endl;
        glfwTerminate();
        return false;
    }
    glfwMakeContextCurrent(window_);
    glfwSwapInterval(1);  // Enable vsync
    // Initialize ImGui
    IMGUI_CHECKVERSION();
    ImGui::CreateContext();
    ImGuiIO& io = ImGui::GetIO();
    io.ConfigFlags |= ImGuiConfigFlags_NavEnableKeyboard;
    ImGui::StyleColorsDark();
    ImGui_ImplGlfw_InitForOpenGL(window_, true);
    ImGui_ImplOpenGL3_Init("#version 330");
    return true;
 }
 void TranslationUI::render() {
    glfwPollEvents();
    // Start ImGui frame
    ImGui_ImplOpenGL3_NewFrame();
    ImGui_ImplGlfw_NewFrame();
    ImGui::NewFrame();
    // Main window (full viewport)
    ImGui::SetNextWindowPos(ImVec2(0, 0));
    ImGui::SetNextWindowSize(ImGui::GetIO().DisplaySize);
    ImGui::Begin("SecondVoice", nullptr,
        ImGuiWindowFlags_NoTitleBar |
        ImGuiWindowFlags_NoResize |
        ImGuiWindowFlags_NoMove |
        ImGuiWindowFlags_NoCollapse);
    renderTranslations();
    renderControls();
    renderStatus();
    ImGui::End();
    // Rendering
    ImGui::Render();
    int display_w, display_h;
    glfwGetFramebufferSize(window_, &display_w, &display_h);
    glViewport(0, 0, display_w, display_h);
    glClearColor(0.1f, 0.1f, 0.1f, 1.0f);
    glClear(GL_COLOR_BUFFER_BIT);
    ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData());
    glfwSwapBuffers(window_);
 }
 bool TranslationUI::shouldClose() const {
    return glfwWindowShouldClose(window_);
 }
 void TranslationUI::addTranslation(const std::string& chinese, const std::string& french) {
    messages_.push_back({chinese, french});
 }
 void TranslationUI::renderTranslations() {
    ImGui::Text("SecondVoice - Live Translation");
    ImGui::Separator();
    ImGui::BeginChild("Translations", ImVec2(0, -120), true);
    for (const auto& msg : messages_) {
        ImGui::PushStyleColor(ImGuiCol_Text, ImVec4(0.5f, 0.8f, 1.0f, 1.0f));
        ImGui::TextWrapped("中文: %s", msg.chinese.c_str());
        ImGui::PopStyleColor();
        ImGui::PushStyleColor(ImGuiCol_Text, ImVec4(0.5f, 1.0f, 0.5f, 1.0f));
        ImGui::TextWrapped("FR: %s", msg.french.c_str());
        ImGui::PopStyleColor();
        ImGui::Spacing();
        ImGui::Separator();
        ImGui::Spacing();
    }
    if (auto_scroll_ && ImGui::GetScrollY() >= ImGui::GetScrollMaxY()) {
        ImGui::SetScrollHereY(1.0f);
    }
    ImGui::EndChild();
 }
 void TranslationUI::renderControls() {
    ImGui::Spacing();
    // Center the stop button
    float button_width = 200.0f;
    float window_width = ImGui::GetWindowWidth();
    ImGui::SetCursorPosX((window_width - button_width) * 0.5f);
    if (ImGui::Button("STOP RECORDING", ImVec2(button_width, 40))) {
        stop_requested_ = true;
    }
    ImGui::Spacing();
 }
 void TranslationUI::renderStatus() {
    ImGui::Separator();
    // Format duration as MM:SS
    int minutes = recording_duration_ / 60;
    int seconds = recording_duration_ % 60;
    ImGui::Text("Recording... Duration: %02d:%02d", minutes, seconds);
    if (!processing_status_.empty()) {
        ImGui::SameLine();
        ImGui::Text(" | Status: %s", processing_status_.c_str());
    }
 }
 } // namespace secondvoice
--- a/src/ui/TranslationUI.h
+++ b/src/ui/TranslationUI.h
@ -0,0 +1,47 @@
 #pragma once
 #include <string>
 #include <vector>
 #include <GLFW/glfw3.h>
 namespace secondvoice {
 struct TranslationMessage {
    std::string chinese;
    std::string french;
 };
 class TranslationUI {
 public:
    TranslationUI(int width, int height);
    ~TranslationUI();
    bool initialize();
    void render();
    bool shouldClose() const;
    void addTranslation(const std::string& chinese, const std::string& french);
    bool isStopRequested() const { return stop_requested_; }
    void resetStopRequest() { stop_requested_ = false; }
    void setRecordingDuration(int seconds) { recording_duration_ = seconds; }
    void setProcessingStatus(const std::string& status) { processing_status_ = status; }
 private:
    void renderTranslations();
    void renderControls();
    void renderStatus();
    int width_;
    int height_;
    GLFWwindow* window_ = nullptr;
    std::vector<TranslationMessage> messages_;
    bool stop_requested_ = false;
    bool auto_scroll_ = true;
    int recording_duration_ = 0;
    std::string processing_status_;
 };
 } // namespace secondvoice
--- a/src/utils/Config.cpp
+++ b/src/utils/Config.cpp
@ -0,0 +1,106 @@
 #include "Config.h"
 #include <nlohmann/json.hpp>
 #include <fstream>
 #include <iostream>
 #include <cstdlib>
 using json = nlohmann::json;
 namespace secondvoice {
 Config& Config::getInstance() {
    static Config instance;
    return instance;
 }
 bool Config::load(const std::string& config_path, const std::string& env_path) {
    // Load .env file
    std::ifstream env_file(env_path);
    if (env_file.is_open()) {
        std::string line;
        while (std::getline(env_file, line)) {
            if (line.empty() || line[0] == '#') continue;
            auto pos = line.find('=');
            if (pos != std::string::npos) {
                std::string key = line.substr(0, pos);
                std::string value = line.substr(pos + 1);
                // Remove quotes if present
                if (!value.empty() && value.front() == '"' && value.back() == '"') {
                    value = value.substr(1, value.length() - 2);
                }
                if (key == "OPENAI_API_KEY") {
                    openai_key_ = value;
                } else if (key == "ANTHROPIC_API_KEY") {
                    anthropic_key_ = value;
                }
            }
        }
        env_file.close();
    } else {
        std::cerr << "Warning: Could not open .env file: " << env_path << std::endl;
    }
    // Load config.json
    std::ifstream config_file(config_path);
    if (!config_file.is_open()) {
        std::cerr << "Error: Could not open config file: " << config_path << std::endl;
        return false;
    }
    json config_json;
    try {
        config_file >> config_json;
    } catch (const json::parse_error& e) {
        std::cerr << "Error parsing config.json: " << e.what() << std::endl;
        return false;
    }
    // Parse audio config
    if (config_json.contains("audio")) {
        auto& audio = config_json["audio"];
        audio_config_.sample_rate = audio.value("sample_rate", 16000);
        audio_config_.channels = audio.value("channels", 1);
        audio_config_.chunk_duration_seconds = audio.value("chunk_duration_seconds", 10);
        audio_config_.format = audio.value("format", "wav");
    }
    // Parse whisper config
    if (config_json.contains("whisper")) {
        auto& whisper = config_json["whisper"];
        whisper_config_.model = whisper.value("model", "whisper-1");
        whisper_config_.language = whisper.value("language", "zh");
        whisper_config_.temperature = whisper.value("temperature", 0.0f);
    }
    // Parse claude config
    if (config_json.contains("claude")) {
        auto& claude = config_json["claude"];
        claude_config_.model = claude.value("model", "claude-haiku-4-20250514");
        claude_config_.max_tokens = claude.value("max_tokens", 1024);
        claude_config_.temperature = claude.value("temperature", 0.3f);
        claude_config_.system_prompt = claude.value("system_prompt", "");
    }
    // Parse UI config
    if (config_json.contains("ui")) {
        auto& ui = config_json["ui"];
        ui_config_.window_width = ui.value("window_width", 800);
        ui_config_.window_height = ui.value("window_height", 600);
        ui_config_.font_size = ui.value("font_size", 16);
        ui_config_.max_display_lines = ui.value("max_display_lines", 50);
    }
    // Parse recording config
    if (config_json.contains("recording")) {
        auto& recording = config_json["recording"];
        recording_config_.save_audio = recording.value("save_audio", true);
        recording_config_.output_directory = recording.value("output_directory", "./recordings");
    }
    return true;
 }
 } // namespace secondvoice
--- a/src/utils/Config.h
+++ b/src/utils/Config.h
@ -0,0 +1,69 @@
 #pragma once
 #include <string>
 namespace secondvoice {
 struct AudioConfig {
    int sample_rate;
    int channels;
    int chunk_duration_seconds;
    std::string format;
 };
 struct WhisperConfig {
    std::string model;
    std::string language;
    float temperature;
 };
 struct ClaudeConfig {
    std::string model;
    int max_tokens;
    float temperature;
    std::string system_prompt;
 };
 struct UIConfig {
    int window_width;
    int window_height;
    int font_size;
    int max_display_lines;
 };
 struct RecordingConfig {
    bool save_audio;
    std::string output_directory;
 };
 class Config {
 public:
    static Config& getInstance();
    bool load(const std::string& config_path, const std::string& env_path);
    const AudioConfig& getAudioConfig() const { return audio_config_; }
    const WhisperConfig& getWhisperConfig() const { return whisper_config_; }
    const ClaudeConfig& getClaudeConfig() const { return claude_config_; }
    const UIConfig& getUIConfig() const { return ui_config_; }
    const RecordingConfig& getRecordingConfig() const { return recording_config_; }
    const std::string& getOpenAIKey() const { return openai_key_; }
    const std::string& getAnthropicKey() const { return anthropic_key_; }
 private:
    Config() = default;
    Config(const Config&) = delete;
    Config& operator=(const Config&) = delete;
    AudioConfig audio_config_;
    WhisperConfig whisper_config_;
    ClaudeConfig claude_config_;
    UIConfig ui_config_;
    RecordingConfig recording_config_;
    std::string openai_key_;
    std::string anthropic_key_;
 };
 } // namespace secondvoice
--- a/src/utils/ThreadSafeQueue.h
+++ b/src/utils/ThreadSafeQueue.h
@ -0,0 +1,65 @@
 #pragma once
 #include <queue>
 #include <mutex>
 #include <condition_variable>
 #include <optional>
 namespace secondvoice {
 template<typename T>
 class ThreadSafeQueue {
 public:
    void push(T value) {
        std::lock_guard<std::mutex> lock(mutex_);
        queue_.push(std::move(value));
        cv_.notify_one();
    }
    std::optional<T> pop() {
        std::unique_lock<std::mutex> lock(mutex_);
        if (queue_.empty()) {
            return std::nullopt;
        }
        T value = std::move(queue_.front());
        queue_.pop();
        return value;
    }
    std::optional<T> wait_and_pop() {
        std::unique_lock<std::mutex> lock(mutex_);
        cv_.wait(lock, [this] { return !queue_.empty() || shutdown_; });
        if (shutdown_ && queue_.empty()) {
            return std::nullopt;
        }
        T value = std::move(queue_.front());
        queue_.pop();
        return value;
    }
    bool empty() const {
        std::lock_guard<std::mutex> lock(mutex_);
        return queue_.empty();
    }
    size_t size() const {
        std::lock_guard<std::mutex> lock(mutex_);
        return queue_.size();
    }
    void shutdown() {
        std::lock_guard<std::mutex> lock(mutex_);
        shutdown_ = true;
        cv_.notify_all();
    }
 private:
    mutable std::mutex mutex_;
    std::condition_variable cv_;
    std::queue<T> queue_;
    bool shutdown_ = false;
 };
 } // namespace secondvoice
--- a/vcpkg.json
+++ b/vcpkg.json
@ -0,0 +1,15 @@
 {
  "name": "secondvoice",
  "version": "0.1.0",
  "dependencies": [
    "portaudio",
    "cpp-httplib",
    "nlohmann-json",
    {
      "name": "imgui",
      "features": ["glfw-binding", "opengl3-binding"]
    },
    "glfw3",
    "opengl"
  ]
 }