Commit Graph

24 Commits

Author SHA1 Message Date
1db83b7bce chore: Add sessions/ and .claudiomiro/ to gitignore
Exclude runtime-generated session logs and local config from version control.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 11:40:58 +08:00
3ec2a8beca feat: Add session logging, input gain, and context-aware prompts
Major features:
- Session logging system with detailed segment tracking (audio files, metadata, latencies)
- Input gain control (0.5x-5.0x amplifier) with soft clipping
- Context-aware Whisper prompts using recent transcriptions
- Comprehensive segment metadata (RMS, peak, duration, timestamps)
- API latency measurements for Whisper and Claude
- Audio hash-based duplicate detection
- Hallucination filtering with detailed logging

Changes:
- Add SessionLogger class for structured session data export
- Apply input gain before VAD and denoising (not just raw input)
- Enhanced Pipeline with segment tracking and error logging
- New UI control for input gain amplifier
- Sessions saved to sessions/ directory with transcripts/ export
- Improved Whisper prompt in config.json (French instructions)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 12:17:21 +08:00
9163e082da docs: Add Whisper prompt improvement strategy
- Document current prompt limitations
- Propose improved prompt with anti-hallucination instructions
- Suggest dynamic context and domain vocabulary enhancements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 20:09:04 +08:00
a3b38cf32a docs: Add objective section to debug plan
Define the real-world use case: making SecondVoice work in degraded conditions (meetings with multiple voices, variable distances, poor mic quality).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 20:04:08 +08:00
f288156869 docs: Add test conditions context to debug plan
Document that the initial transcript analysis was done under degraded conditions (multiple voices, variable distances/volumes, poor mic) which may explain some of the segmentation issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 20:02:40 +08:00
21bcc9ed71 feat: Add transcript export and debug planning docs
- Add CLAUDE.md with project documentation for AI assistance
- Add PLAN_DEBUG.md with debugging hypotheses and logging plan
- Update Pipeline and TranslationUI with transcript export functionality

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 19:59:29 +08:00
371e86d0b7 chore: Clean up repo - remove audio files and update gitignore
- Remove tracked audio files (ogg, wav) from git history
- Update .gitignore to ignore all audio formats (ogg, wav, mp3, flac, aac, m4a)
- Remove local config and temp files from tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 17:34:10 +08:00
8bba4f9334 Update 2025-11-23 16:57:44 +08:00
4e9b7f9e95 feat: Add RNNoise denoising + transient suppressor + VAD improvements
- Add RNNoise neural network audio denoising (16kHz↔48kHz resampling)
- Add transient suppressor to filter claps/clicks/pops before RNNoise
- VAD now works on FILTERED audio (not raw) to avoid false triggers
- Real-time denoised audio level display in UI
- Save denoised audio previews in Opus format (.ogg)
- Add extensive Whisper hallucination filter (Tingting, music, etc.)
- Add "Clear" button to reset accumulated translations
- Double VAD thresholds (0.02/0.08) for less sensitivity
- Update Claude prompt to handle offensive content gracefully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 16:57:43 +08:00
741ca09663 feat: Add RNNoise denoising + transient suppressor + VAD improvements
- Add RNNoise neural network audio denoising (16kHz↔48kHz resampling)
- Add transient suppressor to filter claps/clicks/pops before RNNoise
- VAD now works on FILTERED audio (not raw) to avoid false triggers
- Real-time denoised audio level display in UI
- Save denoised audio previews in Opus format (.ogg)
- Add extensive Whisper hallucination filter (Tingting, music, etc.)
- Add "Clear" button to reset accumulated translations
- Double VAD thresholds (0.02/0.08) for less sensitivity
- Update Claude prompt to handle offensive content gracefully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 16:46:38 +08:00
fa8ea2907b feat: Major improvements - WinHTTP, gpt-4o-mini, Opus, sliding window
- Replace cpp-httplib with native WinHTTP for HTTPS support
- Switch from whisper-1 to gpt-4o-mini-transcribe model
- Use Opus/OGG encoding instead of WAV (~10x smaller files)
- Implement sliding window audio capture with overlap
- Add transcription deduplication for overlapping segments
- Add Voice Activity Detection (VAD) to filter silence/noise
- Filter Whisper hallucinations (Amara.org, etc.)
- Add UTF-8 console support for Chinese characters
- Add Chinese font loading in ImGui
- Make Claude responses concise (translation only, no explanations)
- Configurable window size, font size, chunk duration/step

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 12:17:41 +08:00
089acbfff1 fix: Normalize line endings in TranslationUI.cpp
Convert Windows CRLF to Unix LF line endings for consistency across platforms and version control.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 18:11:57 +08:00
14ed043bf5 feat: Add Opus audio encoding for 46x bandwidth reduction
Replace WAV with Opus encoding for Whisper API uploads:
- Add libopus & libogg via FetchContent (no vcpkg dependency)
- Implement AudioBuffer::saveToOpus() with Ogg container
- Configure Opus encoder for voice (VOIP mode, 32kbps VBR)
- Update WhisperClient to use Opus format (audio/ogg)
- Fix Windows temp file path compatibility

Benefits:
- 46x smaller files (37KB vs 1.7MB for 10s audio)
- Reduced API costs and bandwidth
- Faster uploads for real-time translation
- Whisper API fully supports Opus/Ogg format

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 17:38:16 +08:00
7dec7a6eed fix: Résolution complète du problème OpenGL/ImGui avec threading
PROBLÈME RÉSOLU: Les shaders ImGui compilent maintenant avec succès!

Changements majeurs:
- Remplacé vcpkg ImGui par FetchContent (compilation from source)
- Créé wrapper GLAD pour ImGui (imgui_opengl3_glad.cpp)
- Ajout de makeContextCurrent() pour gérer le contexte OpenGL multi-thread
- Release du contexte dans initialize(), puis rendu current dans uiThread()

Root Cause Analysis:
1. Rendering s'exécute dans uiThread() (thread séparé)
2. Contexte OpenGL créé dans thread principal n'était pas accessible
3. glCreateShader retournait 0 avec GL_INVALID_OPERATION (erreur 1282)
4. Solution: Transfer du contexte OpenGL du thread principal vers uiThread

Debugging profond:
- Ajout de logs debug dans ImGui pour tracer glCreateShader
- Découvert que handle=0 indiquait échec de création (pas compilation)
- Identifié erreur WGL \"ressource en cours d'utilisation\" = contexte locked

Fichiers modifiés:
- vcpkg.json: Supprimé imgui
- CMakeLists.txt: FetchContent pour ImGui + imgui_backends library
- src/imgui_opengl3_glad.cpp: Nouveau wrapper GLAD
- src/ui/TranslationUI.{h,cpp}: Ajout makeContextCurrent()
- src/core/Pipeline.cpp: Appel makeContextCurrent() dans uiThread()
- build/.../imgui_impl_opengl3.cpp: Debug logs (temporaire)

Résultat: UI fonctionne! NVIDIA RTX 4060 GPU, OpenGL 3.3.0, shaders compilent

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 16:37:47 +08:00
ddf34db2a0 feat: Add GLAD OpenGL loader and NVIDIA GPU forcing
Changes:
- Add GLAD dependency via vcpkg for proper OpenGL function loading
- Force NVIDIA GPU usage with game-style exports (NvOptimusEnablement)
- Create working console version (SecondVoice_Console.exe)
- Add dual executable build (UI + Console versions)
- Update to OpenGL 4.6 Core Profile with GLSL 460
- Add GPU detection and logging
- Fix GLFW header conflicts with GLFW_INCLUDE_NONE

Note: OpenGL shaders still failing to compile despite GLAD integration.
Console version is fully functional for audio capture and translation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:18:54 +08:00
07b792b2bd fix: Add MinGW build support and compatibility fixes
- Add MinGW compatibility shim for cpp-httplib GetAddrInfoExCancel
- Fix portaudio linking (portaudio_static -> portaudio)
- Disable -Werror for MinGW builds due to httplib incompatibilities
- Add console subsystem flag for MinGW builds
- Add debug logging utilities (Logger.h)
- Add MessageBox debugging for Windows troubleshooting
- Update build scripts with better error handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:43:13 +08:00
94ad6b4a22 feat: Add MinGW support - Build without Visual Studio!
Lightweight Windows build option using MinGW-w64 instead of Visual Studio:

Size Comparison:
- Visual Studio: 10-20 GB install
- MinGW: ~500 MB install (20x smaller!)

New Files:
- setup_mingw.bat: One-click installer for all tools
  - Chocolatey (package manager)
  - MinGW-w64 (GCC compiler)
  - CMake, Ninja, Git
  - vcpkg integration

- build_mingw.bat: Build script for MinGW
  - Auto-detection of GCC
  - Debug/Release modes
  - Clean build support
  - User-friendly error messages

- WINDOWS_MINGW.md: Complete MinGW guide
  - Installation instructions
  - Troubleshooting
  - Performance comparison MSVC vs GCC
  - Distribution guide

CMake Updates:
- Added mingw-debug and mingw-release presets
- GCC compiler flags: -O3 -Wall -Wextra
- Static linking for portable .exe

Documentation:
- Updated WINDOWS_QUICK_START.md with MinGW option
- Comparison table: MinGW vs Visual Studio
- Recommendation: MinGW for most users

Benefits:
- 20x smaller download (500MB vs 10-20GB)
- 5-10 min install vs 30-60 min
- Same performance as MSVC
- Portable standalone .exe
- Perfect for users without Visual Studio

Usage:
1. Run setup_mingw.bat (one time)
2. Restart terminal
3. Run build_mingw.bat --release
4. Done!

Output: build/mingw-release/SecondVoice.exe

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:42:41 +08:00
a3ade5af9a docs: Add Windows quick start guide 2025-11-20 03:39:04 +08:00
99a9cc22d7 feat: Add Windows support with .exe build
Complete Windows build support for SecondVoice:

Build System:
- Added CMakePresets.json with Windows and Linux presets
- Created build.bat script for easy Windows builds
- Support for Visual Studio 2019+ with Ninja generator
- Automatic vcpkg integration and dependency installation

Scripts:
- build.bat with Debug/Release modes and clean builds
- Auto-detection of Visual Studio and compiler tools
- User-friendly error messages and setup instructions

Documentation:
- Comprehensive docs/build_windows.md guide
- Step-by-step Windows build instructions
- Troubleshooting section for common issues
- Distribution guide for portable .exe

Updates:
- Updated README.md with cross-platform instructions
- Enhanced .gitignore for Windows build artifacts
- Separate build directories for Windows/Linux

Platform Support:
- Windows 10/11 with Visual Studio 2019+
- Linux with GCC/Clang (existing)
- Shared vcpkg dependencies across platforms

Output:
- Windows: build/windows-release/Release/SecondVoice.exe
- Linux: build/SecondVoice

Next Steps:
- Build on Windows with: build.bat --release
- Executable ready for distribution
- Same config.json and .env work cross-platform

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:38:18 +08:00
40c451b9f8 feat: Upgrade to latest Whisper API with GPT-4o models and prompting
Major improvements to Whisper API integration:

New Features:
- Support for gpt-4o-mini-transcribe and gpt-4o-transcribe models
- Prompting support for better name recognition and context
- Response format configuration (text, json, verbose_json)
- Stream flag prepared for future streaming implementation

Configuration Updates:
- Updated config.json with new Whisper parameters
- Added prompt, stream, and response_format fields
- Default model: gpt-4o-mini-transcribe (better quality than whisper-1)

Code Changes:
- Extended WhisperClient::transcribe() with new parameters
- Updated Config struct to support new fields
- Modified Pipeline to pass all config parameters to Whisper
- Added comprehensive documentation in docs/whisper_upgrade.md

Benefits:
- Better transcription accuracy (~33% improvement)
- Improved name recognition (Tingting, Alexis)
- Context-aware transcription with prompting
- Ready for future streaming and diarization

Documentation:
- Complete guide in docs/whisper_upgrade.md
- Usage examples and best practices
- Cost comparison and optimization tips
- Future roadmap for Phase 2 features

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:34:09 +08:00
fa882fc2d6 fix: Resolve compilation errors and build successfully
Build fixes:
- Add missing includes (<cstdint>, <iomanip>, <sstream>, <string>, <vector>)
- Fix unused parameter warnings with (void) casts
- Fix cpp-httplib API: Use UploadFormDataItems instead of MultipartFormDataItems
- Fix portaudio linking: Use portaudio_static target instead of portaudio

All modules now compile without errors. Executable built successfully (13MB).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:27:18 +08:00
67b1587047 docs: Add comprehensive next steps guide 2025-11-20 03:09:40 +08:00
5b60acaa73 feat: Implement complete MVP architecture for SecondVoice
Complete implementation of the real-time Chinese-to-French translation system:

Architecture:
- 3-threaded pipeline: Audio capture → AI processing → UI rendering
- Thread-safe queues for inter-thread communication
- Configurable audio chunk sizes for latency tuning

Core Features:
- Audio capture with PortAudio (configurable sample rate/channels)
- Whisper API integration for Chinese speech-to-text
- Claude API integration for Chinese-to-French translation
- ImGui real-time display with stop button
- Full recording saved to WAV on stop

Modules Implemented:
- audio/: AudioCapture (PortAudio wrapper) + AudioBuffer (WAV export)
- api/: WhisperClient + ClaudeClient (HTTP API wrappers)
- ui/: TranslationUI (ImGui interface)
- core/: Pipeline (orchestrates all threads)
- utils/: Config (JSON/.env loader) + ThreadSafeQueue (template)

Build System:
- CMake with vcpkg for dependency management
- vcpkg.json manifest for reproducible builds
- build.sh helper script

Configuration:
- config.json: Audio settings, API parameters, UI config
- .env: API keys (OpenAI + Anthropic)

Documentation:
- README.md: Setup instructions, usage, architecture
- docs/implementation_plan.md: Technical design document
- docs/SecondVoice.md: Project vision and motivation

Next Steps:
- Test build with vcpkg dependencies
- Test audio capture on real hardware
- Validate API integrations
- Tune chunk size for optimal latency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 03:08:03 +08:00
6248fb2322 Initial commit: Add documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 02:06:35 +08:00