aissia/create_test_audio.py
StillHammer d7971e0c34 feat: Phase 7 STT - Complete Windows setup with Whisper.cpp
Added Speech-to-Text configuration and testing infrastructure:

## STT Engines Configured
-  Whisper.cpp (local, offline) - base model downloaded (142MB)
-  OpenAI Whisper API - configured with existing API key
-  Google Speech-to-Text - configured with existing API key
- ⚠️ Azure STT - optional (not configured)
- ⚠️ Deepgram - optional (not configured)

## New Files
- `docs/STT_SETUP.md` - Complete Windows STT setup guide
- `test_stt_live.cpp` - Test tool for all 5 STT engines
- `create_test_audio_simple.py` - Generate test audio (440Hz tone, 16kHz WAV)
- `create_test_audio.py` - Generate speech audio (requires gtts)
- `models/ggml-base.bin` - Whisper.cpp base model (gitignored)
- `test_audio.wav` - Generated test audio (gitignored)

## Documentation
- Complete setup guide for all STT engines
- API key configuration instructions
- Model download links and recommendations
- Troubleshooting section
- Cost comparison for cloud APIs

## Next Steps
- Compile test_stt_live.cpp to validate all engines
- Test with real audio input
- Integrate into VoiceModule via pub/sub

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 17:12:37 +08:00

36 lines
1.0 KiB
Python

#!/usr/bin/env python3
"""Generate test audio WAV file for STT testing"""
import sys
try:
from gtts import gTTS
import os
from pydub import AudioSegment
# Generate French test audio
text = "Bonjour, ceci est un test de reconnaissance vocale."
print(f"Generating audio: '{text}'")
# Create TTS
tts = gTTS(text=text, lang='fr', slow=False)
tts.save("test_audio_temp.mp3")
print("✓ Generated MP3")
# Convert to WAV (16kHz, mono, 16-bit PCM)
audio = AudioSegment.from_mp3("test_audio_temp.mp3")
audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
audio.export("test_audio.wav", format="wav")
print("✓ Converted to WAV (16kHz, mono, 16-bit)")
# Cleanup
os.remove("test_audio_temp.mp3")
print("✓ Saved as test_audio.wav")
print(f"Duration: {len(audio)/1000:.1f}s")
except ImportError as e:
print(f"Missing dependency: {e}")
print("\nInstall with: pip install gtts pydub")
print("Note: pydub also requires ffmpeg")
sys.exit(1)