Commit Graph

4 Commits

Author SHA1 Message Date
741ca09663 feat: Add RNNoise denoising + transient suppressor + VAD improvements
- Add RNNoise neural network audio denoising (16kHz↔48kHz resampling)
- Add transient suppressor to filter claps/clicks/pops before RNNoise
- VAD now works on FILTERED audio (not raw) to avoid false triggers
- Real-time denoised audio level display in UI
- Save denoised audio previews in Opus format (.ogg)
- Add extensive Whisper hallucination filter (Tingting, music, etc.)
- Add "Clear" button to reset accumulated translations
- Double VAD thresholds (0.02/0.08) for less sensitivity
- Update Claude prompt to handle offensive content gracefully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 16:46:38 +08:00
fa8ea2907b feat: Major improvements - WinHTTP, gpt-4o-mini, Opus, sliding window
- Replace cpp-httplib with native WinHTTP for HTTPS support
- Switch from whisper-1 to gpt-4o-mini-transcribe model
- Use Opus/OGG encoding instead of WAV (~10x smaller files)
- Implement sliding window audio capture with overlap
- Add transcription deduplication for overlapping segments
- Add Voice Activity Detection (VAD) to filter silence/noise
- Filter Whisper hallucinations (Amara.org, etc.)
- Add UTF-8 console support for Chinese characters
- Add Chinese font loading in ImGui
- Make Claude responses concise (translation only, no explanations)
- Configurable window size, font size, chunk duration/step

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 12:17:41 +08:00
fa882fc2d6 fix: Resolve compilation errors and build successfully
Build fixes:
- Add missing includes (<cstdint>, <iomanip>, <sstream>, <string>, <vector>)
- Fix unused parameter warnings with (void) casts
- Fix cpp-httplib API: Use UploadFormDataItems instead of MultipartFormDataItems
- Fix portaudio linking: Use portaudio_static target instead of portaudio

All modules now compile without errors. Executable built successfully (13MB).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:27:18 +08:00
5b60acaa73 feat: Implement complete MVP architecture for SecondVoice
Complete implementation of the real-time Chinese-to-French translation system:

Architecture:
- 3-threaded pipeline: Audio capture → AI processing → UI rendering
- Thread-safe queues for inter-thread communication
- Configurable audio chunk sizes for latency tuning

Core Features:
- Audio capture with PortAudio (configurable sample rate/channels)
- Whisper API integration for Chinese speech-to-text
- Claude API integration for Chinese-to-French translation
- ImGui real-time display with stop button
- Full recording saved to WAV on stop

Modules Implemented:
- audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export)
- api/: WhisperClient + ClaudeClient (HTTP API wrappers)
- ui/: TranslationUI (ImGui interface)
- core/: Pipeline (orchestrates all threads)
- utils/: Config (JSON/.env loader) + ThreadSafeQueue (template)

Build System:
- CMake with vcpkg for dependency management
- vcpkg.json manifest for reproducible builds
- build.sh helper script

Configuration:
- config.json: Audio settings, API parameters, UI config
- .env: API keys (OpenAI + Anthropic)

Documentation:
- README.md: Setup instructions, usage, architecture
- docs/implementation_plan.md: Technical design document
- docs/SecondVoice.md: Project vision and motivation

Next Steps:
- Test build with vcpkg dependencies
- Test audio capture on real hardware
- Validate API integrations
- Tune chunk size for optimal latency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:08:03 +08:00