Go to file

StillHammer 67b1587047 docs: Add comprehensive next steps guide		2025-11-20 03:09:40 +08:00
docs	docs: Add comprehensive next steps guide	2025-11-20 03:09:40 +08:00
recordings	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00
src	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00
.env.example	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00
.gitignore	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00
build.sh	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00
CMakeLists.txt	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00
config.json	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00
README.md	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00
vcpkg.json	feat: Implement complete MVP architecture for SecondVoice	2025-11-20 03:08:03 +08:00

README.md

SecondVoice

Real-time Chinese to French translation system for live meetings.

Overview

SecondVoice captures audio, transcribes Chinese speech using OpenAI's Whisper API, and translates it to French using Claude AI in real-time. Perfect for understanding Chinese meetings on the fly.

Features

🎤 Real-time audio capture
🗣️ Chinese speech-to-text (Whisper API)
🌐 Chinese to French translation (Claude API)
🖥️ Clean ImGui interface
💾 Full recording saved to disk
⚙️ Configurable chunk sizes and settings

Requirements

System Dependencies (Linux)

# PortAudio
sudo apt install libasound2-dev

# OpenGL
sudo apt install libgl1-mesa-dev libglu1-mesa-dev

vcpkg

Install vcpkg if not already installed:

git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
export VCPKG_ROOT=$(pwd)

Setup

Clone the repository

git clone <repository-url>
cd secondvoice

Create .env file (copy from .env.example)

cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

Configure settings (optional)

Edit config.json to customize:

Audio chunk duration (default: 10s)
Sample rate (default: 16kHz)
UI window size
Output directory

Build the project

# Configure with vcpkg
cmake -B build -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake

# Build
cmake --build build -j$(nproc)

Usage

cd build
./SecondVoice

The application will:

Open an ImGui window
Start capturing audio from your microphone
Display Chinese transcriptions and French translations in real-time
Click STOP RECORDING button to finish
Save the full audio recording to recordings/recording_YYYYMMDD_HHMMSS.wav

Architecture

Audio Capture (PortAudio)
    ↓
Whisper API (Speech-to-Text)
    ↓
Claude API (Translation)
    ↓
ImGui UI (Display)

Threading Model

Thread 1: Audio capture (PortAudio callback)
Thread 2: AI processing (Whisper + Claude API calls)
Thread 3: UI rendering (ImGui + OpenGL)

Configuration

config.json

{
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh"
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024
  }
}

.env

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Cost Estimation

Whisper: ~$0.006/minute (~$0.36/hour)
Claude Haiku: ~$0.03-0.05/hour
Total: ~$0.40/hour of recording

Project Structure

secondvoice/
├── src/
│   ├── main.cpp                 # Entry point
│   ├── audio/                   # Audio capture & buffer
│   ├── api/                     # Whisper & Claude clients
│   ├── ui/                      # ImGui interface
│   ├── utils/                   # Config & thread-safe queue
│   └── core/                    # Pipeline orchestration
├── docs/                        # Documentation
├── recordings/                  # Output recordings
├── config.json                  # Runtime configuration
├── .env                         # API keys (not committed)
└── CMakeLists.txt              # Build configuration

Development

Building in Debug Mode

cmake -B build -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
cmake --build build

Running Tests

# TODO: Add tests

Troubleshooting

No audio capture

Check microphone permissions
Verify PortAudio is properly installed: pa_devs (if available)
Try different audio device in code

API errors

Verify API keys in .env are correct
Check internet connection
Monitor API rate limits

Build errors

Ensure vcpkg is properly set up
Check all system dependencies are installed
Try cmake --build build --clean-first

Roadmap

Phase 1 - MVP (Current)

✅ Audio capture
✅ Whisper integration
✅ Claude integration
✅ ImGui UI
✅ Stop button

Phase 2 - Enhancement

⬜ Auto-summary post-meeting
⬜ Export transcripts
⬜ Search functionality
⬜ Speaker diarization
⬜ Replay mode

License

See LICENSE file.

Contributing

This is a personal project, but suggestions and bug reports are welcome via issues.

Contact

See docs/SecondVoice.md for project context and motivation.