Exclude runtime-generated session logs and local config from version control.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major features:
- Session logging system with detailed segment tracking (audio files, metadata, latencies)
- Input gain control (0.5x-5.0x amplifier) with soft clipping
- Context-aware Whisper prompts using recent transcriptions
- Comprehensive segment metadata (RMS, peak, duration, timestamps)
- API latency measurements for Whisper and Claude
- Audio hash-based duplicate detection
- Hallucination filtering with detailed logging
Changes:
- Add SessionLogger class for structured session data export
- Apply input gain before VAD and denoising (not just raw input)
- Enhanced Pipeline with segment tracking and error logging
- New UI control for input gain amplifier
- Sessions saved to sessions/ directory with transcripts/ export
- Improved Whisper prompt in config.json (French instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Define the real-world use case: making SecondVoice work in degraded conditions (meetings with multiple voices, variable distances, poor mic quality).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Document that the initial transcript analysis was done under degraded conditions (multiple voices, variable distances/volumes, poor mic) which may explain some of the segmentation issues.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add CLAUDE.md with project documentation for AI assistance
- Add PLAN_DEBUG.md with debugging hypotheses and logging plan
- Update Pipeline and TranslationUI with transcript export functionality
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove tracked audio files (ogg, wav) from git history
- Update .gitignore to ignore all audio formats (ogg, wav, mp3, flac, aac, m4a)
- Remove local config and temp files from tracking
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add RNNoise neural network audio denoising (16kHz↔48kHz resampling)
- Add transient suppressor to filter claps/clicks/pops before RNNoise
- VAD now works on FILTERED audio (not raw) to avoid false triggers
- Real-time denoised audio level display in UI
- Save denoised audio previews in Opus format (.ogg)
- Add extensive Whisper hallucination filter (Tingting, music, etc.)
- Add "Clear" button to reset accumulated translations
- Double VAD thresholds (0.02/0.08) for less sensitivity
- Update Claude prompt to handle offensive content gracefully
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add RNNoise neural network audio denoising (16kHz↔48kHz resampling)
- Add transient suppressor to filter claps/clicks/pops before RNNoise
- VAD now works on FILTERED audio (not raw) to avoid false triggers
- Real-time denoised audio level display in UI
- Save denoised audio previews in Opus format (.ogg)
- Add extensive Whisper hallucination filter (Tingting, music, etc.)
- Add "Clear" button to reset accumulated translations
- Double VAD thresholds (0.02/0.08) for less sensitivity
- Update Claude prompt to handle offensive content gracefully
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace cpp-httplib with native WinHTTP for HTTPS support
- Switch from whisper-1 to gpt-4o-mini-transcribe model
- Use Opus/OGG encoding instead of WAV (~10x smaller files)
- Implement sliding window audio capture with overlap
- Add transcription deduplication for overlapping segments
- Add Voice Activity Detection (VAD) to filter silence/noise
- Filter Whisper hallucinations (Amara.org, etc.)
- Add UTF-8 console support for Chinese characters
- Add Chinese font loading in ImGui
- Make Claude responses concise (translation only, no explanations)
- Configurable window size, font size, chunk duration/step
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Convert Windows CRLF to Unix LF line endings for consistency across platforms and version control.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Replace WAV with Opus encoding for Whisper API uploads:
- Add libopus & libogg via FetchContent (no vcpkg dependency)
- Implement AudioBuffer::saveToOpus() with Ogg container
- Configure Opus encoder for voice (VOIP mode, 32kbps VBR)
- Update WhisperClient to use Opus format (audio/ogg)
- Fix Windows temp file path compatibility
Benefits:
- 46x smaller files (37KB vs 1.7MB for 10s audio)
- Reduced API costs and bandwidth
- Faster uploads for real-time translation
- Whisper API fully supports Opus/Ogg format
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
PROBLÈME RÉSOLU: Les shaders ImGui compilent maintenant avec succès!
Changements majeurs:
- Remplacé vcpkg ImGui par FetchContent (compilation from source)
- Créé wrapper GLAD pour ImGui (imgui_opengl3_glad.cpp)
- Ajout de makeContextCurrent() pour gérer le contexte OpenGL multi-thread
- Release du contexte dans initialize(), puis rendu current dans uiThread()
Root Cause Analysis:
1. Rendering s'exécute dans uiThread() (thread séparé)
2. Contexte OpenGL créé dans thread principal n'était pas accessible
3. glCreateShader retournait 0 avec GL_INVALID_OPERATION (erreur 1282)
4. Solution: Transfer du contexte OpenGL du thread principal vers uiThread
Debugging profond:
- Ajout de logs debug dans ImGui pour tracer glCreateShader
- Découvert que handle=0 indiquait échec de création (pas compilation)
- Identifié erreur WGL \"ressource en cours d'utilisation\" = contexte locked
Fichiers modifiés:
- vcpkg.json: Supprimé imgui
- CMakeLists.txt: FetchContent pour ImGui + imgui_backends library
- src/imgui_opengl3_glad.cpp: Nouveau wrapper GLAD
- src/ui/TranslationUI.{h,cpp}: Ajout makeContextCurrent()
- src/core/Pipeline.cpp: Appel makeContextCurrent() dans uiThread()
- build/.../imgui_impl_opengl3.cpp: Debug logs (temporaire)
Résultat: UI fonctionne! NVIDIA RTX 4060 GPU, OpenGL 3.3.0, shaders compilent
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
- Add GLAD dependency via vcpkg for proper OpenGL function loading
- Force NVIDIA GPU usage with game-style exports (NvOptimusEnablement)
- Create working console version (SecondVoice_Console.exe)
- Add dual executable build (UI + Console versions)
- Update to OpenGL 4.6 Core Profile with GLSL 460
- Add GPU detection and logging
- Fix GLFW header conflicts with GLFW_INCLUDE_NONE
Note: OpenGL shaders still failing to compile despite GLAD integration.
Console version is fully functional for audio capture and translation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Complete Windows build support for SecondVoice:
Build System:
- Added CMakePresets.json with Windows and Linux presets
- Created build.bat script for easy Windows builds
- Support for Visual Studio 2019+ with Ninja generator
- Automatic vcpkg integration and dependency installation
Scripts:
- build.bat with Debug/Release modes and clean builds
- Auto-detection of Visual Studio and compiler tools
- User-friendly error messages and setup instructions
Documentation:
- Comprehensive docs/build_windows.md guide
- Step-by-step Windows build instructions
- Troubleshooting section for common issues
- Distribution guide for portable .exe
Updates:
- Updated README.md with cross-platform instructions
- Enhanced .gitignore for Windows build artifacts
- Separate build directories for Windows/Linux
Platform Support:
- Windows 10/11 with Visual Studio 2019+
- Linux with GCC/Clang (existing)
- Shared vcpkg dependencies across platforms
Output:
- Windows: build/windows-release/Release/SecondVoice.exe
- Linux: build/SecondVoice
Next Steps:
- Build on Windows with: build.bat --release
- Executable ready for distribution
- Same config.json and .env work cross-platform
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major improvements to Whisper API integration:
New Features:
- Support for gpt-4o-mini-transcribe and gpt-4o-transcribe models
- Prompting support for better name recognition and context
- Response format configuration (text, json, verbose_json)
- Stream flag prepared for future streaming implementation
Configuration Updates:
- Updated config.json with new Whisper parameters
- Added prompt, stream, and response_format fields
- Default model: gpt-4o-mini-transcribe (better quality than whisper-1)
Code Changes:
- Extended WhisperClient::transcribe() with new parameters
- Updated Config struct to support new fields
- Modified Pipeline to pass all config parameters to Whisper
- Added comprehensive documentation in docs/whisper_upgrade.md
Benefits:
- Better transcription accuracy (~33% improvement)
- Improved name recognition (Tingting, Alexis)
- Context-aware transcription with prompting
- Ready for future streaming and diarization
Documentation:
- Complete guide in docs/whisper_upgrade.md
- Usage examples and best practices
- Cost comparison and optimization tips
- Future roadmap for Phase 2 features
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Build fixes:
- Add missing includes (<cstdint>, <iomanip>, <sstream>, <string>, <vector>)
- Fix unused parameter warnings with (void) casts
- Fix cpp-httplib API: Use UploadFormDataItems instead of MultipartFormDataItems
- Fix portaudio linking: Use portaudio_static target instead of portaudio
All modules now compile without errors. Executable built successfully (13MB).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>