Added Speech-to-Text configuration and testing infrastructure: ## STT Engines Configured - ✅ Whisper.cpp (local, offline) - base model downloaded (142MB) - ✅ OpenAI Whisper API - configured with existing API key - ✅ Google Speech-to-Text - configured with existing API key - ⚠️ Azure STT - optional (not configured) - ⚠️ Deepgram - optional (not configured) ## New Files - `docs/STT_SETUP.md` - Complete Windows STT setup guide - `test_stt_live.cpp` - Test tool for all 5 STT engines - `create_test_audio_simple.py` - Generate test audio (440Hz tone, 16kHz WAV) - `create_test_audio.py` - Generate speech audio (requires gtts) - `models/ggml-base.bin` - Whisper.cpp base model (gitignored) - `test_audio.wav` - Generated test audio (gitignored) ## Documentation - Complete setup guide for all STT engines - API key configuration instructions - Model download links and recommendations - Troubleshooting section - Cost comparison for cloud APIs ## Next Steps - Compile test_stt_live.cpp to validate all engines - Test with real audio input - Integrate into VoiceModule via pub/sub 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
36 lines
1.0 KiB
Python
36 lines
1.0 KiB
Python
#!/usr/bin/env python3
|
|
"""Generate test audio WAV file for STT testing"""
|
|
|
|
import sys
|
|
|
|
try:
|
|
from gtts import gTTS
|
|
import os
|
|
from pydub import AudioSegment
|
|
|
|
# Generate French test audio
|
|
text = "Bonjour, ceci est un test de reconnaissance vocale."
|
|
print(f"Generating audio: '{text}'")
|
|
|
|
# Create TTS
|
|
tts = gTTS(text=text, lang='fr', slow=False)
|
|
tts.save("test_audio_temp.mp3")
|
|
print("✓ Generated MP3")
|
|
|
|
# Convert to WAV (16kHz, mono, 16-bit PCM)
|
|
audio = AudioSegment.from_mp3("test_audio_temp.mp3")
|
|
audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
|
|
audio.export("test_audio.wav", format="wav")
|
|
print("✓ Converted to WAV (16kHz, mono, 16-bit)")
|
|
|
|
# Cleanup
|
|
os.remove("test_audio_temp.mp3")
|
|
print("✓ Saved as test_audio.wav")
|
|
print(f"Duration: {len(audio)/1000:.1f}s")
|
|
|
|
except ImportError as e:
|
|
print(f"Missing dependency: {e}")
|
|
print("\nInstall with: pip install gtts pydub")
|
|
print("Note: pydub also requires ffmpeg")
|
|
sys.exit(1)
|