secondvoice

Author	SHA1	Message	Date
StillHammer	db0f8e5990	refactor: Improve VAD trailing silence detection and update docs - Replace hang time logic with consecutive silence frame counter for more precise speech end detection - Update Whisper prompt to utilize previous context for better transcription coherence - Expand README with comprehensive feature list, architecture details, debugging status, and session logging structure - Add troubleshooting section for real-world testing conditions and known issues	2025-12-02 09:44:06 +08:00
Trouve Alexis	a28bb89913	tune: Adjust VAD parameters for longer segments - min_speech_duration: 300ms → 1000ms (avoid tiny segments) - silence_duration: 400ms → 700ms (wait longer before cutting) - hang_frames_threshold: 20 → 35 (~350ms pause tolerance) This should reduce mid-sentence cuts and give Whisper more context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 22:08:01 +08:00
Trouve Alexis	741ca09663	feat: Add RNNoise denoising + transient suppressor + VAD improvements - Add RNNoise neural network audio denoising (16kHz↔48kHz resampling) - Add transient suppressor to filter claps/clicks/pops before RNNoise - VAD now works on FILTERED audio (not raw) to avoid false triggers - Real-time denoised audio level display in UI - Save denoised audio previews in Opus format (.ogg) - Add extensive Whisper hallucination filter (Tingting, music, etc.) - Add "Clear" button to reset accumulated translations - Double VAD thresholds (0.02/0.08) for less sensitivity - Update Claude prompt to handle offensive content gracefully 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 16:46:38 +08:00
Trouve Alexis	fa8ea2907b	feat: Major improvements - WinHTTP, gpt-4o-mini, Opus, sliding window - Replace cpp-httplib with native WinHTTP for HTTPS support - Switch from whisper-1 to gpt-4o-mini-transcribe model - Use Opus/OGG encoding instead of WAV (~10x smaller files) - Implement sliding window audio capture with overlap - Add transcription deduplication for overlapping segments - Add Voice Activity Detection (VAD) to filter silence/noise - Filter Whisper hallucinations (Amara.org, etc.) - Add UTF-8 console support for Chinese characters - Add Chinese font loading in ImGui - Make Claude responses concise (translation only, no explanations) - Configurable window size, font size, chunk duration/step 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 12:17:41 +08:00
StillHammer	5b60acaa73	feat: Implement complete MVP architecture for SecondVoice Complete implementation of the real-time Chinese-to-French translation system: Architecture: - 3-threaded pipeline: Audio capture → AI processing → UI rendering - Thread-safe queues for inter-thread communication - Configurable audio chunk sizes for latency tuning Core Features: - Audio capture with PortAudio (configurable sample rate/channels) - Whisper API integration for Chinese speech-to-text - Claude API integration for Chinese-to-French translation - ImGui real-time display with stop button - Full recording saved to WAV on stop Modules Implemented: - audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export) - api/: WhisperClient + ClaudeClient (HTTP API wrappers) - ui/: TranslationUI (ImGui interface) - core/: Pipeline (orchestrates all threads) - utils/: Config (JSON/.env loader) + ThreadSafeQueue (template) Build System: - CMake with vcpkg for dependency management - vcpkg.json manifest for reproducible builds - build.sh helper script Configuration: - config.json: Audio settings, API parameters, UI config - .env: API keys (OpenAI + Anthropic) Documentation: - README.md: Setup instructions, usage, architecture - docs/implementation_plan.md: Technical design document - docs/SecondVoice.md: Project vision and motivation Next Steps: - Test build with vcpkg dependencies - Test audio capture on real hardware - Validate API integrations - Tune chunk size for optimal latency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 03:08:03 +08:00

5 Commits