StillHammer 5b60acaa73 feat: Implement complete MVP architecture for SecondVoice

Complete implementation of the real-time Chinese-to-French translation system:

Architecture:
- 3-threaded pipeline: Audio capture → AI processing → UI rendering
- Thread-safe queues for inter-thread communication
- Configurable audio chunk sizes for latency tuning

Core Features:
- Audio capture with PortAudio (configurable sample rate/channels)
- Whisper API integration for Chinese speech-to-text
- Claude API integration for Chinese-to-French translation
- ImGui real-time display with stop button
- Full recording saved to WAV on stop

Modules Implemented:
- audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export)
- api/: WhisperClient + ClaudeClient (HTTP API wrappers)
- ui/: TranslationUI (ImGui interface)
- core/: Pipeline (orchestrates all threads)
- utils/: Config (JSON/.env loader) + ThreadSafeQueue (template)

Build System:
- CMake with vcpkg for dependency management
- vcpkg.json manifest for reproducible builds
- build.sh helper script

Configuration:
- config.json: Audio settings, API parameters, UI config
- .env: API keys (OpenAI + Anthropic)

Documentation:
- README.md: Setup instructions, usage, architecture
- docs/implementation_plan.md: Technical design document
- docs/SecondVoice.md: Project vision and motivation

Next Steps:
- Test build with vcpkg dependencies
- Test audio capture on real hardware
- Validate API integrations
- Tune chunk size for optimal latency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-20 03:08:03 +08:00

4.5 KiB

Raw Blame History

SecondVoice

Real-time Chinese to French translation system for live meetings.

Overview

SecondVoice captures audio, transcribes Chinese speech using OpenAI's Whisper API, and translates it to French using Claude AI in real-time. Perfect for understanding Chinese meetings on the fly.

Features

🎤 Real-time audio capture
🗣️ Chinese speech-to-text (Whisper API)
🌐 Chinese to French translation (Claude API)
🖥️ Clean ImGui interface
💾 Full recording saved to disk
⚙️ Configurable chunk sizes and settings

Requirements

System Dependencies (Linux)

# PortAudio
sudo apt install libasound2-dev

# OpenGL
sudo apt install libgl1-mesa-dev libglu1-mesa-dev

vcpkg

Install vcpkg if not already installed:

git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
export VCPKG_ROOT=$(pwd)

Setup

Clone the repository

git clone <repository-url>
cd secondvoice

Create .env file (copy from .env.example)

cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

Configure settings (optional)

Edit config.json to customize:

Audio chunk duration (default: 10s)
Sample rate (default: 16kHz)
UI window size
Output directory

Build the project

# Configure with vcpkg
cmake -B build -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake

# Build
cmake --build build -j$(nproc)

Usage

cd build
./SecondVoice

The application will:

Open an ImGui window
Start capturing audio from your microphone
Display Chinese transcriptions and French translations in real-time
Click STOP RECORDING button to finish
Save the full audio recording to recordings/recording_YYYYMMDD_HHMMSS.wav

Architecture

Audio Capture (PortAudio)
    ↓
Whisper API (Speech-to-Text)
    ↓
Claude API (Translation)
    ↓
ImGui UI (Display)

Threading Model

Thread 1: Audio capture (PortAudio callback)
Thread 2: AI processing (Whisper + Claude API calls)
Thread 3: UI rendering (ImGui + OpenGL)

Configuration

config.json

{
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh"
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024
  }
}

.env

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Cost Estimation

Whisper: ~$0.006/minute (~$0.36/hour)
Claude Haiku: ~$0.03-0.05/hour
Total: ~$0.40/hour of recording

Project Structure

secondvoice/
├── src/
│   ├── main.cpp                 # Entry point
│   ├── audio/                   # Audio capture & buffer
│   ├── api/                     # Whisper & Claude clients
│   ├── ui/                      # ImGui interface
│   ├── utils/                   # Config & thread-safe queue
│   └── core/                    # Pipeline orchestration
├── docs/                        # Documentation
├── recordings/                  # Output recordings
├── config.json                  # Runtime configuration
├── .env                         # API keys (not committed)
└── CMakeLists.txt              # Build configuration

Development

Building in Debug Mode

cmake -B build -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
cmake --build build

Running Tests

# TODO: Add tests

Troubleshooting

No audio capture

Check microphone permissions
Verify PortAudio is properly installed: pa_devs (if available)
Try different audio device in code

API errors

Verify API keys in .env are correct
Check internet connection
Monitor API rate limits

Build errors

Ensure vcpkg is properly set up
Check all system dependencies are installed
Try cmake --build build --clean-first

Roadmap

Phase 1 - MVP (Current)

✅ Audio capture
✅ Whisper integration
✅ Claude integration
✅ ImGui UI
✅ Stop button

Phase 2 - Enhancement

⬜ Auto-summary post-meeting
⬜ Export transcripts
⬜ Search functionality
⬜ Speaker diarization
⬜ Replay mode

License

See LICENSE file.

Contributing

This is a personal project, but suggestions and bug reports are welcome via issues.

Contact

See docs/SecondVoice.md for project context and motivation.

4.5 KiB Raw Blame History