secondvoice/README.md
StillHammer 5b60acaa73 feat: Implement complete MVP architecture for SecondVoice
Complete implementation of the real-time Chinese-to-French translation system:

Architecture:
- 3-threaded pipeline: Audio capture → AI processing → UI rendering
- Thread-safe queues for inter-thread communication
- Configurable audio chunk sizes for latency tuning

Core Features:
- Audio capture with PortAudio (configurable sample rate/channels)
- Whisper API integration for Chinese speech-to-text
- Claude API integration for Chinese-to-French translation
- ImGui real-time display with stop button
- Full recording saved to WAV on stop

Modules Implemented:
- audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export)
- api/: WhisperClient + ClaudeClient (HTTP API wrappers)
- ui/: TranslationUI (ImGui interface)
- core/: Pipeline (orchestrates all threads)
- utils/: Config (JSON/.env loader) + ThreadSafeQueue (template)

Build System:
- CMake with vcpkg for dependency management
- vcpkg.json manifest for reproducible builds
- build.sh helper script

Configuration:
- config.json: Audio settings, API parameters, UI config
- .env: API keys (OpenAI + Anthropic)

Documentation:
- README.md: Setup instructions, usage, architecture
- docs/implementation_plan.md: Technical design document
- docs/SecondVoice.md: Project vision and motivation

Next Steps:
- Test build with vcpkg dependencies
- Test audio capture on real hardware
- Validate API integrations
- Tune chunk size for optimal latency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:08:03 +08:00

4.5 KiB

SecondVoice

Real-time Chinese to French translation system for live meetings.

Overview

SecondVoice captures audio, transcribes Chinese speech using OpenAI's Whisper API, and translates it to French using Claude AI in real-time. Perfect for understanding Chinese meetings on the fly.

Features

  • 🎤 Real-time audio capture
  • 🗣️ Chinese speech-to-text (Whisper API)
  • 🌐 Chinese to French translation (Claude API)
  • 🖥️ Clean ImGui interface
  • 💾 Full recording saved to disk
  • ⚙️ Configurable chunk sizes and settings

Requirements

System Dependencies (Linux)

# PortAudio
sudo apt install libasound2-dev

# OpenGL
sudo apt install libgl1-mesa-dev libglu1-mesa-dev

vcpkg

Install vcpkg if not already installed:

git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
export VCPKG_ROOT=$(pwd)

Setup

  1. Clone the repository
git clone <repository-url>
cd secondvoice
  1. Create .env file (copy from .env.example)
cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
  1. Configure settings (optional)

Edit config.json to customize:

  • Audio chunk duration (default: 10s)
  • Sample rate (default: 16kHz)
  • UI window size
  • Output directory
  1. Build the project
# Configure with vcpkg
cmake -B build -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake

# Build
cmake --build build -j$(nproc)

Usage

cd build
./SecondVoice

The application will:

  1. Open an ImGui window
  2. Start capturing audio from your microphone
  3. Display Chinese transcriptions and French translations in real-time
  4. Click STOP RECORDING button to finish
  5. Save the full audio recording to recordings/recording_YYYYMMDD_HHMMSS.wav

Architecture

Audio Capture (PortAudio)
    ↓
Whisper API (Speech-to-Text)
    ↓
Claude API (Translation)
    ↓
ImGui UI (Display)

Threading Model

  • Thread 1: Audio capture (PortAudio callback)
  • Thread 2: AI processing (Whisper + Claude API calls)
  • Thread 3: UI rendering (ImGui + OpenGL)

Configuration

config.json

{
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh"
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024
  }
}

.env

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Cost Estimation

  • Whisper: ~$0.006/minute (~$0.36/hour)
  • Claude Haiku: ~$0.03-0.05/hour
  • Total: ~$0.40/hour of recording

Project Structure

secondvoice/
├── src/
│   ├── main.cpp                 # Entry point
│   ├── audio/                   # Audio capture & buffer
│   ├── api/                     # Whisper & Claude clients
│   ├── ui/                      # ImGui interface
│   ├── utils/                   # Config & thread-safe queue
│   └── core/                    # Pipeline orchestration
├── docs/                        # Documentation
├── recordings/                  # Output recordings
├── config.json                  # Runtime configuration
├── .env                         # API keys (not committed)
└── CMakeLists.txt              # Build configuration

Development

Building in Debug Mode

cmake -B build -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
cmake --build build

Running Tests

# TODO: Add tests

Troubleshooting

No audio capture

  • Check microphone permissions
  • Verify PortAudio is properly installed: pa_devs (if available)
  • Try different audio device in code

API errors

  • Verify API keys in .env are correct
  • Check internet connection
  • Monitor API rate limits

Build errors

  • Ensure vcpkg is properly set up
  • Check all system dependencies are installed
  • Try cmake --build build --clean-first

Roadmap

Phase 1 - MVP (Current)

  • Audio capture
  • Whisper integration
  • Claude integration
  • ImGui UI
  • Stop button

Phase 2 - Enhancement

  • Auto-summary post-meeting
  • Export transcripts
  • Search functionality
  • Speaker diarization
  • Replay mode

License

See LICENSE file.

Contributing

This is a personal project, but suggestions and bug reports are welcome via issues.

Contact

See docs/SecondVoice.md for project context and motivation.