# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview SecondVoice is a real-time Chinese-to-French translation system for live meetings. It captures audio, transcribes Chinese speech using OpenAI's Whisper API (gpt-4o-mini-transcribe), and translates it to French using Claude AI. ## Build Commands ### Windows (MinGW) - Primary Build ```batch # First-time setup .\setup_mingw.bat # Build (Release) .\build_mingw.bat # Build (Debug) .\build_mingw.bat --debug # Clean rebuild .\build_mingw.bat --clean ``` ### Running the Application ```batch cd build\mingw-Release SecondVoice.exe ``` Requires: - `.env` file with `OPENAI_API_KEY` and `ANTHROPIC_API_KEY` - `config.json` (copied automatically during build) - A microphone ## Architecture ### Threading Model (3 threads) 1. **Audio Thread** (`Pipeline::audioThread`) - PortAudio callback captures audio, applies VAD (Voice Activity Detection), pushes chunks to queue 2. **Processing Thread** (`Pipeline::processingThread`) - Consumes audio chunks, calls Whisper API for transcription, then Claude API for translation 3. **UI Thread** (main) - GLFW/ImGui rendering loop, must run on main thread ### Core Components ``` src/ ├── main.cpp # Entry point, forces NVIDIA GPU ├── core/Pipeline.cpp # Orchestrates audio→transcription→translation flow ├── audio/ │ ├── AudioCapture.cpp # PortAudio wrapper with VAD-based segmentation │ ├── AudioBuffer.cpp # Accumulates samples, exports WAV/Opus │ └── NoiseReducer.cpp # RNNoise denoising (16kHz→48kHz→16kHz resampling) ├── api/ │ ├── WhisperClient.cpp # OpenAI Whisper API (multipart/form-data) │ ├── ClaudeClient.cpp # Anthropic Claude API (JSON) │ └── WinHttpClient.cpp # Native Windows HTTP client (replaced libcurl) ├── ui/TranslationUI.cpp # ImGui interface with VAD threshold controls └── utils/ ├── Config.cpp # Loads config.json + .env └── ThreadSafeQueue.h # Lock-free queue for audio chunks ``` ### Key Data Flow 1. `AudioCapture` detects speech via VAD thresholds (RMS + Peak) 2. Speech segments sent to `NoiseReducer` (RNNoise) for denoising 3. Denoised audio encoded to Opus/OGG for bandwidth efficiency (46x reduction) 4. `WhisperClient` sends audio to gpt-4o-mini-transcribe 5. `Pipeline` filters Whisper hallucinations (known garbage phrases) 6. `ClaudeClient` translates Chinese text to French 7. `TranslationUI` displays accumulated transcription/translation ### External Dependencies (fetched via CMake FetchContent) - **ImGui** v1.90.1 - UI framework - **Opus** v1.5.2 - Audio encoding - **Ogg** v1.3.6 - Container format - **RNNoise** v0.1.1 - Neural network noise reduction ### vcpkg Dependencies (x64-mingw-static triplet) - portaudio, nlohmann_json, glfw3, glad ## Configuration ### config.json - `audio.sample_rate`: 16000 Hz (required for Whisper) - `whisper.model`: "gpt-4o-mini-transcribe" - `whisper.language`: "zh" (Chinese) - `claude.model`: "claude-3-5-haiku-20241022" ### VAD Tuning VAD thresholds are adjustable in the UI at runtime: - RMS threshold: speech detection sensitivity - Peak threshold: transient/click rejection ## Important Implementation Details ### Whisper Hallucination Filtering `Pipeline.cpp` contains an extensive list of known Whisper hallucinations (lines ~195-260) that are filtered out: - "Thank you for watching", "Subscribe", YouTube phrases - Chinese video endings: "谢谢观看", "再见", "订阅" - Music symbols, silence markers - Single-word interjections ### GPU Forcing (Optimus/PowerXpress) `main.cpp` exports `NvOptimusEnablement` and `AmdPowerXpressRequestHighPerformance` symbols to force dedicated GPU usage on hybrid graphics systems. ### Audio Processing Pipeline 1. 16kHz mono input → Upsampled to 48kHz for RNNoise 2. RNNoise denoising (480-sample frames at 48kHz) 3. Transient suppression (claps, clicks, pops) 4. Downsampled back to 16kHz 5. Opus encoding at 24kbps for API transmission ## Console-Only Build A `SecondVoice_Console` target exists for testing without UI: - Uses `main_console.cpp` - No ImGui/GLFW dependencies - Outputs transcriptions to stdout