secondvoice

Author	SHA1	Message	Date
Trouve Alexis	1db83b7bce	chore: Add sessions/ and .claudiomiro/ to gitignore Exclude runtime-generated session logs and local config from version control. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 11:40:58 +08:00
StillHammer	3ec2a8beca	feat: Add session logging, input gain, and context-aware prompts Major features: - Session logging system with detailed segment tracking (audio files, metadata, latencies) - Input gain control (0.5x-5.0x amplifier) with soft clipping - Context-aware Whisper prompts using recent transcriptions - Comprehensive segment metadata (RMS, peak, duration, timestamps) - API latency measurements for Whisper and Claude - Audio hash-based duplicate detection - Hallucination filtering with detailed logging Changes: - Add SessionLogger class for structured session data export - Apply input gain before VAD and denoising (not just raw input) - Enhanced Pipeline with segment tracking and error logging - New UI control for input gain amplifier - Sessions saved to sessions/ directory with transcripts/ export - Improved Whisper prompt in config.json (French instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 12:17:21 +08:00
StillHammer	9163e082da	docs: Add Whisper prompt improvement strategy - Document current prompt limitations - Propose improved prompt with anti-hallucination instructions - Suggest dynamic context and domain vocabulary enhancements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 20:09:04 +08:00
StillHammer	a3b38cf32a	docs: Add objective section to debug plan Define the real-world use case: making SecondVoice work in degraded conditions (meetings with multiple voices, variable distances, poor mic quality). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 20:04:08 +08:00
StillHammer	f288156869	docs: Add test conditions context to debug plan Document that the initial transcript analysis was done under degraded conditions (multiple voices, variable distances/volumes, poor mic) which may explain some of the segmentation issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 20:02:40 +08:00
StillHammer	21bcc9ed71	feat: Add transcript export and debug planning docs - Add CLAUDE.md with project documentation for AI assistance - Add PLAN_DEBUG.md with debugging hypotheses and logging plan - Update Pipeline and TranslationUI with transcript export functionality 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 19:59:29 +08:00
StillHammer	371e86d0b7	chore: Clean up repo - remove audio files and update gitignore - Remove tracked audio files (ogg, wav) from git history - Update .gitignore to ignore all audio formats (ogg, wav, mp3, flac, aac, m4a) - Remove local config and temp files from tracking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 17:34:10 +08:00
StillHammer	8bba4f9334	Update	2025-11-23 16:57:44 +08:00
Trouve Alexis	4e9b7f9e95	feat: Add RNNoise denoising + transient suppressor + VAD improvements - Add RNNoise neural network audio denoising (16kHz↔48kHz resampling) - Add transient suppressor to filter claps/clicks/pops before RNNoise - VAD now works on FILTERED audio (not raw) to avoid false triggers - Real-time denoised audio level display in UI - Save denoised audio previews in Opus format (.ogg) - Add extensive Whisper hallucination filter (Tingting, music, etc.) - Add "Clear" button to reset accumulated translations - Double VAD thresholds (0.02/0.08) for less sensitivity - Update Claude prompt to handle offensive content gracefully 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 16:57:43 +08:00
Trouve Alexis	741ca09663	feat: Add RNNoise denoising + transient suppressor + VAD improvements - Add RNNoise neural network audio denoising (16kHz↔48kHz resampling) - Add transient suppressor to filter claps/clicks/pops before RNNoise - VAD now works on FILTERED audio (not raw) to avoid false triggers - Real-time denoised audio level display in UI - Save denoised audio previews in Opus format (.ogg) - Add extensive Whisper hallucination filter (Tingting, music, etc.) - Add "Clear" button to reset accumulated translations - Double VAD thresholds (0.02/0.08) for less sensitivity - Update Claude prompt to handle offensive content gracefully 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 16:46:38 +08:00
Trouve Alexis	fa8ea2907b	feat: Major improvements - WinHTTP, gpt-4o-mini, Opus, sliding window - Replace cpp-httplib with native WinHTTP for HTTPS support - Switch from whisper-1 to gpt-4o-mini-transcribe model - Use Opus/OGG encoding instead of WAV (~10x smaller files) - Implement sliding window audio capture with overlap - Add transcription deduplication for overlapping segments - Add Voice Activity Detection (VAD) to filter silence/noise - Filter Whisper hallucinations (Amara.org, etc.) - Add UTF-8 console support for Chinese characters - Add Chinese font loading in ImGui - Make Claude responses concise (translation only, no explanations) - Configurable window size, font size, chunk duration/step 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 12:17:41 +08:00
StillHammer	089acbfff1	fix: Normalize line endings in TranslationUI.cpp Convert Windows CRLF to Unix LF line endings for consistency across platforms and version control. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 18:11:57 +08:00
StillHammer	14ed043bf5	feat: Add Opus audio encoding for 46x bandwidth reduction Replace WAV with Opus encoding for Whisper API uploads: - Add libopus & libogg via FetchContent (no vcpkg dependency) - Implement AudioBuffer::saveToOpus() with Ogg container - Configure Opus encoder for voice (VOIP mode, 32kbps VBR) - Update WhisperClient to use Opus format (audio/ogg) - Fix Windows temp file path compatibility Benefits: - 46x smaller files (37KB vs 1.7MB for 10s audio) - Reduced API costs and bandwidth - Faster uploads for real-time translation - Whisper API fully supports Opus/Ogg format 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 17:38:16 +08:00
StillHammer	7dec7a6eed	fix: Résolution complète du problème OpenGL/ImGui avec threading PROBLÈME RÉSOLU: Les shaders ImGui compilent maintenant avec succès! Changements majeurs: - Remplacé vcpkg ImGui par FetchContent (compilation from source) - Créé wrapper GLAD pour ImGui (imgui_opengl3_glad.cpp) - Ajout de makeContextCurrent() pour gérer le contexte OpenGL multi-thread - Release du contexte dans initialize(), puis rendu current dans uiThread() Root Cause Analysis: 1. Rendering s'exécute dans uiThread() (thread séparé) 2. Contexte OpenGL créé dans thread principal n'était pas accessible 3. glCreateShader retournait 0 avec GL_INVALID_OPERATION (erreur 1282) 4. Solution: Transfer du contexte OpenGL du thread principal vers uiThread Debugging profond: - Ajout de logs debug dans ImGui pour tracer glCreateShader - Découvert que handle=0 indiquait échec de création (pas compilation) - Identifié erreur WGL \"ressource en cours d'utilisation\" = contexte locked Fichiers modifiés: - vcpkg.json: Supprimé imgui - CMakeLists.txt: FetchContent pour ImGui + imgui_backends library - src/imgui_opengl3_glad.cpp: Nouveau wrapper GLAD - src/ui/TranslationUI.{h,cpp}: Ajout makeContextCurrent() - src/core/Pipeline.cpp: Appel makeContextCurrent() dans uiThread() - build/.../imgui_impl_opengl3.cpp: Debug logs (temporaire) Résultat: UI fonctionne! NVIDIA RTX 4060 GPU, OpenGL 3.3.0, shaders compilent 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 16:37:47 +08:00
StillHammer	ddf34db2a0	feat: Add GLAD OpenGL loader and NVIDIA GPU forcing Changes: - Add GLAD dependency via vcpkg for proper OpenGL function loading - Force NVIDIA GPU usage with game-style exports (NvOptimusEnablement) - Create working console version (SecondVoice_Console.exe) - Add dual executable build (UI + Console versions) - Update to OpenGL 4.6 Core Profile with GLSL 460 - Add GPU detection and logging - Fix GLFW header conflicts with GLFW_INCLUDE_NONE Note: OpenGL shaders still failing to compile despite GLAD integration. Console version is fully functional for audio capture and translation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:18:54 +08:00
StillHammer	07b792b2bd	fix: Add MinGW build support and compatibility fixes - Add MinGW compatibility shim for cpp-httplib GetAddrInfoExCancel - Fix portaudio linking (portaudio_static -> portaudio) - Disable -Werror for MinGW builds due to httplib incompatibilities - Add console subsystem flag for MinGW builds - Add debug logging utilities (Logger.h) - Add MessageBox debugging for Windows troubleshooting - Update build scripts with better error handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:43:13 +08:00
StillHammer	94ad6b4a22	feat: Add MinGW support - Build without Visual Studio! Lightweight Windows build option using MinGW-w64 instead of Visual Studio: Size Comparison: - Visual Studio: 10-20 GB install - MinGW: ~500 MB install (20x smaller!) New Files: - setup_mingw.bat: One-click installer for all tools - Chocolatey (package manager) - MinGW-w64 (GCC compiler) - CMake, Ninja, Git - vcpkg integration - build_mingw.bat: Build script for MinGW - Auto-detection of GCC - Debug/Release modes - Clean build support - User-friendly error messages - WINDOWS_MINGW.md: Complete MinGW guide - Installation instructions - Troubleshooting - Performance comparison MSVC vs GCC - Distribution guide CMake Updates: - Added mingw-debug and mingw-release presets - GCC compiler flags: -O3 -Wall -Wextra - Static linking for portable .exe Documentation: - Updated WINDOWS_QUICK_START.md with MinGW option - Comparison table: MinGW vs Visual Studio - Recommendation: MinGW for most users Benefits: - 20x smaller download (500MB vs 10-20GB) - 5-10 min install vs 30-60 min - Same performance as MSVC - Portable standalone .exe - Perfect for users without Visual Studio Usage: 1. Run setup_mingw.bat (one time) 2. Restart terminal 3. Run build_mingw.bat --release 4. Done! Output: build/mingw-release/SecondVoice.exe 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 03:42:41 +08:00
StillHammer	a3ade5af9a	docs: Add Windows quick start guide	2025-11-20 03:39:04 +08:00
StillHammer	99a9cc22d7	feat: Add Windows support with .exe build Complete Windows build support for SecondVoice: Build System: - Added CMakePresets.json with Windows and Linux presets - Created build.bat script for easy Windows builds - Support for Visual Studio 2019+ with Ninja generator - Automatic vcpkg integration and dependency installation Scripts: - build.bat with Debug/Release modes and clean builds - Auto-detection of Visual Studio and compiler tools - User-friendly error messages and setup instructions Documentation: - Comprehensive docs/build_windows.md guide - Step-by-step Windows build instructions - Troubleshooting section for common issues - Distribution guide for portable .exe Updates: - Updated README.md with cross-platform instructions - Enhanced .gitignore for Windows build artifacts - Separate build directories for Windows/Linux Platform Support: - Windows 10/11 with Visual Studio 2019+ - Linux with GCC/Clang (existing) - Shared vcpkg dependencies across platforms Output: - Windows: build/windows-release/Release/SecondVoice.exe - Linux: build/SecondVoice Next Steps: - Build on Windows with: build.bat --release - Executable ready for distribution - Same config.json and .env work cross-platform 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 03:38:18 +08:00
StillHammer	40c451b9f8	feat: Upgrade to latest Whisper API with GPT-4o models and prompting Major improvements to Whisper API integration: New Features: - Support for gpt-4o-mini-transcribe and gpt-4o-transcribe models - Prompting support for better name recognition and context - Response format configuration (text, json, verbose_json) - Stream flag prepared for future streaming implementation Configuration Updates: - Updated config.json with new Whisper parameters - Added prompt, stream, and response_format fields - Default model: gpt-4o-mini-transcribe (better quality than whisper-1) Code Changes: - Extended WhisperClient::transcribe() with new parameters - Updated Config struct to support new fields - Modified Pipeline to pass all config parameters to Whisper - Added comprehensive documentation in docs/whisper_upgrade.md Benefits: - Better transcription accuracy (~33% improvement) - Improved name recognition (Tingting, Alexis) - Context-aware transcription with prompting - Ready for future streaming and diarization Documentation: - Complete guide in docs/whisper_upgrade.md - Usage examples and best practices - Cost comparison and optimization tips - Future roadmap for Phase 2 features 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 03:34:09 +08:00
StillHammer	fa882fc2d6	fix: Resolve compilation errors and build successfully Build fixes: - Add missing includes (<cstdint>, <iomanip>, <sstream>, <string>, <vector>) - Fix unused parameter warnings with (void) casts - Fix cpp-httplib API: Use UploadFormDataItems instead of MultipartFormDataItems - Fix portaudio linking: Use portaudio_static target instead of portaudio All modules now compile without errors. Executable built successfully (13MB). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 03:27:18 +08:00
StillHammer	67b1587047	docs: Add comprehensive next steps guide	2025-11-20 03:09:40 +08:00
StillHammer	5b60acaa73	feat: Implement complete MVP architecture for SecondVoice Complete implementation of the real-time Chinese-to-French translation system: Architecture: - 3-threaded pipeline: Audio capture → AI processing → UI rendering - Thread-safe queues for inter-thread communication - Configurable audio chunk sizes for latency tuning Core Features: - Audio capture with PortAudio (configurable sample rate/channels) - Whisper API integration for Chinese speech-to-text - Claude API integration for Chinese-to-French translation - ImGui real-time display with stop button - Full recording saved to WAV on stop Modules Implemented: - audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export) - api/: WhisperClient + ClaudeClient (HTTP API wrappers) - ui/: TranslationUI (ImGui interface) - core/: Pipeline (orchestrates all threads) - utils/: Config (JSON/.env loader) + ThreadSafeQueue (template) Build System: - CMake with vcpkg for dependency management - vcpkg.json manifest for reproducible builds - build.sh helper script Configuration: - config.json: Audio settings, API parameters, UI config - .env: API keys (OpenAI + Anthropic) Documentation: - README.md: Setup instructions, usage, architecture - docs/implementation_plan.md: Technical design document - docs/SecondVoice.md: Project vision and motivation Next Steps: - Test build with vcpkg dependencies - Test audio capture on real hardware - Validate API integrations - Tune chunk size for optimal latency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 03:08:03 +08:00
StillHammer	6248fb2322	Initial commit: Add documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 02:06:35 +08:00

24 Commits