secondvoice/README.md
StillHammer 99a9cc22d7 feat: Add Windows support with .exe build
Complete Windows build support for SecondVoice:

Build System:
- Added CMakePresets.json with Windows and Linux presets
- Created build.bat script for easy Windows builds
- Support for Visual Studio 2019+ with Ninja generator
- Automatic vcpkg integration and dependency installation

Scripts:
- build.bat with Debug/Release modes and clean builds
- Auto-detection of Visual Studio and compiler tools
- User-friendly error messages and setup instructions

Documentation:
- Comprehensive docs/build_windows.md guide
- Step-by-step Windows build instructions
- Troubleshooting section for common issues
- Distribution guide for portable .exe

Updates:
- Updated README.md with cross-platform instructions
- Enhanced .gitignore for Windows build artifacts
- Separate build directories for Windows/Linux

Platform Support:
- Windows 10/11 with Visual Studio 2019+
- Linux with GCC/Clang (existing)
- Shared vcpkg dependencies across platforms

Output:
- Windows: build/windows-release/Release/SecondVoice.exe
- Linux: build/SecondVoice

Next Steps:
- Build on Windows with: build.bat --release
- Executable ready for distribution
- Same config.json and .env work cross-platform

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:38:18 +08:00

249 lines
5.0 KiB
Markdown

# SecondVoice
Real-time Chinese to French translation system for live meetings.
## Overview
SecondVoice captures audio, transcribes Chinese speech using OpenAI's Whisper API, and translates it to French using Claude AI in real-time. Perfect for understanding Chinese meetings on the fly.
## Features
- 🎤 Real-time audio capture
- 🗣️ Chinese speech-to-text (Whisper API)
- 🌐 Chinese to French translation (Claude API)
- 🖥️ Clean ImGui interface
- 💾 Full recording saved to disk
- ⚙️ Configurable chunk sizes and settings
## Requirements
### Cross-Platform Support
SecondVoice works on **Windows** and **Linux**.
#### Windows
- Visual Studio 2019 or later (with C++ tools)
- vcpkg package manager
- See detailed guide: [docs/build_windows.md](docs/build_windows.md)
#### Linux
- GCC/Clang with C++17 support
- System dependencies: `libasound2-dev`, `libgl1-mesa-dev`, `libglu1-mesa-dev`
- vcpkg package manager
### vcpkg Installation
**Linux**:
```bash
git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
export VCPKG_ROOT=$(pwd)
```
**Windows**:
```powershell
git clone https://github.com/microsoft/vcpkg.git C:\vcpkg
cd C:\vcpkg
.\bootstrap-vcpkg.bat
setx VCPKG_ROOT "C:\vcpkg"
```
## Setup
1. **Clone the repository**
```bash
git clone <repository-url>
cd secondvoice
```
2. **Create `.env` file** (copy from `.env.example`)
**Linux**:
```bash
cp .env.example .env
nano .env
# Add your API keys:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
```
**Windows**:
```powershell
copy .env.example .env
notepad .env
# Add your API keys
```
3. **Build the project**
**Linux**:
```bash
./build.sh
# Or manually:
# cmake -B build -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
# cmake --build build -j$(nproc)
```
**Windows**:
```batch
build.bat --release
REM Or see detailed guide: docs/build_windows.md
```
## Usage
**Linux**:
```bash
cd build
./SecondVoice
```
**Windows**:
```batch
cd build\windows-release\Release
SecondVoice.exe
```
The application will:
1. Open an ImGui window
2. Start capturing audio from your microphone
3. Display Chinese transcriptions and French translations in real-time
4. Click **STOP RECORDING** button to finish
5. Save the full audio recording to `recordings/recording_YYYYMMDD_HHMMSS.wav`
## Architecture
```
Audio Capture (PortAudio)
Whisper API (Speech-to-Text)
Claude API (Translation)
ImGui UI (Display)
```
### Threading Model
- **Thread 1**: Audio capture (PortAudio callback)
- **Thread 2**: AI processing (Whisper + Claude API calls)
- **Thread 3**: UI rendering (ImGui + OpenGL)
## Configuration
### config.json
```json
{
"audio": {
"sample_rate": 16000,
"channels": 1,
"chunk_duration_seconds": 10
},
"whisper": {
"model": "whisper-1",
"language": "zh"
},
"claude": {
"model": "claude-haiku-4-20250514",
"max_tokens": 1024
}
}
```
### .env
```env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
```
## Cost Estimation
- **Whisper**: ~$0.006/minute (~$0.36/hour)
- **Claude Haiku**: ~$0.03-0.05/hour
- **Total**: ~$0.40/hour of recording
## Project Structure
```
secondvoice/
├── src/
│ ├── main.cpp # Entry point
│ ├── audio/ # Audio capture & buffer
│ ├── api/ # Whisper & Claude clients
│ ├── ui/ # ImGui interface
│ ├── utils/ # Config & thread-safe queue
│ └── core/ # Pipeline orchestration
├── docs/ # Documentation
├── recordings/ # Output recordings
├── config.json # Runtime configuration
├── .env # API keys (not committed)
└── CMakeLists.txt # Build configuration
```
## Development
### Building in Debug Mode
```bash
cmake -B build -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake
cmake --build build
```
### Running Tests
```bash
# TODO: Add tests
```
## Troubleshooting
### No audio capture
- Check microphone permissions
- Verify PortAudio is properly installed: `pa_devs` (if available)
- Try different audio device in code
### API errors
- Verify API keys in `.env` are correct
- Check internet connection
- Monitor API rate limits
### Build errors
- Ensure vcpkg is properly set up
- Check all system dependencies are installed
- Try `cmake --build build --clean-first`
## Roadmap
### Phase 1 - MVP (Current)
- ✅ Audio capture
- ✅ Whisper integration
- ✅ Claude integration
- ✅ ImGui UI
- ✅ Stop button
### Phase 2 - Enhancement
- ⬜ Auto-summary post-meeting
- ⬜ Export transcripts
- ⬜ Search functionality
- ⬜ Speaker diarization
- ⬜ Replay mode
## License
See LICENSE file.
## Contributing
This is a personal project, but suggestions and bug reports are welcome via issues.
## Contact
See docs/SecondVoice.md for project context and motivation.