secondvoice/README.md

# SecondVoice

Real-time Chinese to French translation system for live meetings.

## Overview

SecondVoice captures audio, transcribes Chinese speech using OpenAI's Whisper API, and translates it to French using Claude AI in real-time. Perfect for understanding Chinese meetings on the fly.

## Features

- 🎤 Real-time audio capture
- 🗣️ Chinese speech-to-text (Whisper API)
- 🌐 Chinese to French translation (Claude API)
- 🖥️ Clean ImGui interface
- 💾 Full recording saved to disk
- ⚙️ Configurable chunk sizes and settings

## Requirements

### System Dependencies (Linux)

```bash
# PortAudio
sudo apt install libasound2-dev

# OpenGL
sudo apt install libgl1-mesa-dev libglu1-mesa-dev
```

### vcpkg

Install vcpkg if not already installed:

```bash
git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
export VCPKG_ROOT=$(pwd)
```

## Setup

1. **Clone the repository**

```bash
git clone <repository-url>
cd secondvoice
```

2. **Create `.env` file** (copy from `.env.example`)

```bash
cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
```

3. **Configure settings** (optional)

Edit `config.json` to customize:
- Audio chunk duration (default: 10s)
- Sample rate (default: 16kHz)
- UI window size
- Output directory

4. **Build the project**

```bash
# Configure with vcpkg
cmake -B build -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake

# Build
cmake --build build -j$(nproc)
```

## Usage

```bash
cd build
./SecondVoice
```

The application will:
1. Open an ImGui window
2. Start capturing audio from your microphone
3. Display Chinese transcriptions and French translations in real-time
4. Click **STOP RECORDING** button to finish
5. Save the full audio recording to `recordings/recording_YYYYMMDD_HHMMSS.wav`

## Architecture

```
Audio Capture (PortAudio)
    ↓
Whisper API (Speech-to-Text)
    ↓
Claude API (Translation)
    ↓
ImGui UI (Display)
```

### Threading Model

- **Thread 1**: Audio capture (PortAudio callback)
- **Thread 2**: AI processing (Whisper + Claude API calls)
- **Thread 3**: UI rendering (ImGui + OpenGL)

## Configuration

### config.json

```json
{
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_duration_seconds": 10
  },
  "whisper": {
    "model": "whisper-1",
    "language": "zh"
  },
  "claude": {
    "model": "claude-haiku-4-20250514",
    "max_tokens": 1024
  }
}
```

### .env

```env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
```

## Cost Estimation

- **Whisper**: ~$0.006/minute (~$0.36/hour)
- **Claude Haiku**: ~$0.03-0.05/hour
- **Total**: ~$0.40/hour of recording

## Project Structure

```
secondvoice/
├── src/
│   ├── main.cpp                 # Entry point
│   ├── audio/                   # Audio capture & buffer
│   ├── api/                     # Whisper & Claude clients
│   ├── ui/                      # ImGui interface
│   ├── utils/                   # Config & thread-safe queue
│   └── core/                    # Pipeline orchestration
├── docs/                        # Documentation
├── recordings/                  # Output recordings
├── config.json                  # Runtime configuration
├── .env                         # API keys (not committed)
└── CMakeLists.txt              # Build configuration
```

## Development

### Building in Debug Mode

```bash
cmake -B build -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
cmake --build build
```

### Running Tests

```bash
# TODO: Add tests
```

## Troubleshooting

### No audio capture

- Check microphone permissions
- Verify PortAudio is properly installed: `pa_devs` (if available)
- Try different audio device in code

### API errors

- Verify API keys in `.env` are correct
- Check internet connection
- Monitor API rate limits

### Build errors

- Ensure vcpkg is properly set up
- Check all system dependencies are installed
- Try `cmake --build build --clean-first`

## Roadmap

### Phase 1 - MVP (Current)
- ✅ Audio capture
- ✅ Whisper integration
- ✅ Claude integration
- ✅ ImGui UI
- ✅ Stop button

### Phase 2 - Enhancement
- ⬜ Auto-summary post-meeting
- ⬜ Export transcripts
- ⬜ Search functionality
- ⬜ Speaker diarization
- ⬜ Replay mode

## License

See LICENSE file.

## Contributing

This is a personal project, but suggestions and bug reports are welcome via issues.

## Contact

See docs/SecondVoice.md for project context and motivation.