- Add RNNoise neural network audio denoising (16kHz↔48kHz resampling)
- Add transient suppressor to filter claps/clicks/pops before RNNoise
- VAD now works on FILTERED audio (not raw) to avoid false triggers
- Real-time denoised audio level display in UI
- Save denoised audio previews in Opus format (.ogg)
- Add extensive Whisper hallucination filter (Tingting, music, etc.)
- Add "Clear" button to reset accumulated translations
- Double VAD thresholds (0.02/0.08) for less sensitivity
- Update Claude prompt to handle offensive content gracefully
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace cpp-httplib with native WinHTTP for HTTPS support
- Switch from whisper-1 to gpt-4o-mini-transcribe model
- Use Opus/OGG encoding instead of WAV (~10x smaller files)
- Implement sliding window audio capture with overlap
- Add transcription deduplication for overlapping segments
- Add Voice Activity Detection (VAD) to filter silence/noise
- Filter Whisper hallucinations (Amara.org, etc.)
- Add UTF-8 console support for Chinese characters
- Add Chinese font loading in ImGui
- Make Claude responses concise (translation only, no explanations)
- Configurable window size, font size, chunk duration/step
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major improvements to Whisper API integration:
New Features:
- Support for gpt-4o-mini-transcribe and gpt-4o-transcribe models
- Prompting support for better name recognition and context
- Response format configuration (text, json, verbose_json)
- Stream flag prepared for future streaming implementation
Configuration Updates:
- Updated config.json with new Whisper parameters
- Added prompt, stream, and response_format fields
- Default model: gpt-4o-mini-transcribe (better quality than whisper-1)
Code Changes:
- Extended WhisperClient::transcribe() with new parameters
- Updated Config struct to support new fields
- Modified Pipeline to pass all config parameters to Whisper
- Added comprehensive documentation in docs/whisper_upgrade.md
Benefits:
- Better transcription accuracy (~33% improvement)
- Improved name recognition (Tingting, Alexis)
- Context-aware transcription with prompting
- Ready for future streaming and diarization
Documentation:
- Complete guide in docs/whisper_upgrade.md
- Usage examples and best practices
- Cost comparison and optimization tips
- Future roadmap for Phase 2 features
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>