- Add CLAUDE.md with project documentation for AI assistance - Add PLAN_DEBUG.md with debugging hypotheses and logging plan - Update Pipeline and TranslationUI with transcript export functionality 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
4.2 KiB
4.2 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
SecondVoice is a real-time Chinese-to-French translation system for live meetings. It captures audio, transcribes Chinese speech using OpenAI's Whisper API (gpt-4o-mini-transcribe), and translates it to French using Claude AI.
Build Commands
Windows (MinGW) - Primary Build
# First-time setup
.\setup_mingw.bat
# Build (Release)
.\build_mingw.bat
# Build (Debug)
.\build_mingw.bat --debug
# Clean rebuild
.\build_mingw.bat --clean
Running the Application
cd build\mingw-Release
SecondVoice.exe
Requires:
.envfile withOPENAI_API_KEYandANTHROPIC_API_KEYconfig.json(copied automatically during build)- A microphone
Architecture
Threading Model (3 threads)
- Audio Thread (
Pipeline::audioThread) - PortAudio callback captures audio, applies VAD (Voice Activity Detection), pushes chunks to queue - Processing Thread (
Pipeline::processingThread) - Consumes audio chunks, calls Whisper API for transcription, then Claude API for translation - UI Thread (main) - GLFW/ImGui rendering loop, must run on main thread
Core Components
src/
├── main.cpp # Entry point, forces NVIDIA GPU
├── core/Pipeline.cpp # Orchestrates audio→transcription→translation flow
├── audio/
│ ├── AudioCapture.cpp # PortAudio wrapper with VAD-based segmentation
│ ├── AudioBuffer.cpp # Accumulates samples, exports WAV/Opus
│ └── NoiseReducer.cpp # RNNoise denoising (16kHz→48kHz→16kHz resampling)
├── api/
│ ├── WhisperClient.cpp # OpenAI Whisper API (multipart/form-data)
│ ├── ClaudeClient.cpp # Anthropic Claude API (JSON)
│ └── WinHttpClient.cpp # Native Windows HTTP client (replaced libcurl)
├── ui/TranslationUI.cpp # ImGui interface with VAD threshold controls
└── utils/
├── Config.cpp # Loads config.json + .env
└── ThreadSafeQueue.h # Lock-free queue for audio chunks
Key Data Flow
AudioCapturedetects speech via VAD thresholds (RMS + Peak)- Speech segments sent to
NoiseReducer(RNNoise) for denoising - Denoised audio encoded to Opus/OGG for bandwidth efficiency (46x reduction)
WhisperClientsends audio to gpt-4o-mini-transcribePipelinefilters Whisper hallucinations (known garbage phrases)ClaudeClienttranslates Chinese text to FrenchTranslationUIdisplays accumulated transcription/translation
External Dependencies (fetched via CMake FetchContent)
- ImGui v1.90.1 - UI framework
- Opus v1.5.2 - Audio encoding
- Ogg v1.3.6 - Container format
- RNNoise v0.1.1 - Neural network noise reduction
vcpkg Dependencies (x64-mingw-static triplet)
- portaudio, nlohmann_json, glfw3, glad
Configuration
config.json
audio.sample_rate: 16000 Hz (required for Whisper)whisper.model: "gpt-4o-mini-transcribe"whisper.language: "zh" (Chinese)claude.model: "claude-3-5-haiku-20241022"
VAD Tuning
VAD thresholds are adjustable in the UI at runtime:
- RMS threshold: speech detection sensitivity
- Peak threshold: transient/click rejection
Important Implementation Details
Whisper Hallucination Filtering
Pipeline.cpp contains an extensive list of known Whisper hallucinations (lines ~195-260) that are filtered out:
- "Thank you for watching", "Subscribe", YouTube phrases
- Chinese video endings: "谢谢观看", "再见", "订阅"
- Music symbols, silence markers
- Single-word interjections
GPU Forcing (Optimus/PowerXpress)
main.cpp exports NvOptimusEnablement and AmdPowerXpressRequestHighPerformance symbols to force dedicated GPU usage on hybrid graphics systems.
Audio Processing Pipeline
- 16kHz mono input → Upsampled to 48kHz for RNNoise
- RNNoise denoising (480-sample frames at 48kHz)
- Transient suppression (claps, clicks, pops)
- Downsampled back to 16kHz
- Opus encoding at 24kbps for API transmission
Console-Only Build
A SecondVoice_Console target exists for testing without UI:
- Uses
main_console.cpp - No ImGui/GLFW dependencies
- Outputs transcriptions to stdout