Added Speech-to-Text configuration and testing infrastructure: ## STT Engines Configured - ✅ Whisper.cpp (local, offline) - base model downloaded (142MB) - ✅ OpenAI Whisper API - configured with existing API key - ✅ Google Speech-to-Text - configured with existing API key - ⚠️ Azure STT - optional (not configured) - ⚠️ Deepgram - optional (not configured) ## New Files - `docs/STT_SETUP.md` - Complete Windows STT setup guide - `test_stt_live.cpp` - Test tool for all 5 STT engines - `create_test_audio_simple.py` - Generate test audio (440Hz tone, 16kHz WAV) - `create_test_audio.py` - Generate speech audio (requires gtts) - `models/ggml-base.bin` - Whisper.cpp base model (gitignored) - `test_audio.wav` - Generated test audio (gitignored) ## Documentation - Complete setup guide for all STT engines - API key configuration instructions - Model download links and recommendations - Troubleshooting section - Cost comparison for cloud APIs ## Next Steps - Compile test_stt_live.cpp to validate all engines - Test with real audio input - Integrate into VoiceModule via pub/sub 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
39 lines
1.0 KiB
Python
39 lines
1.0 KiB
Python
#!/usr/bin/env python3
|
|
"""Generate simple test audio WAV file using only stdlib"""
|
|
|
|
import wave
|
|
import struct
|
|
import math
|
|
|
|
# WAV parameters
|
|
sample_rate = 16000
|
|
duration = 2 # seconds
|
|
frequency = 440 # Hz (A4 note)
|
|
|
|
# Generate sine wave samples
|
|
samples = []
|
|
for i in range(int(sample_rate * duration)):
|
|
# Sine wave value (-1.0 to 1.0)
|
|
value = math.sin(2.0 * math.pi * frequency * i / sample_rate)
|
|
|
|
# Convert to 16-bit PCM (-32768 to 32767)
|
|
sample = int(value * 32767)
|
|
samples.append(sample)
|
|
|
|
# Write WAV file
|
|
with wave.open("test_audio.wav", "w") as wav_file:
|
|
# Set parameters (1 channel, 2 bytes per sample, 16kHz)
|
|
wav_file.setnchannels(1)
|
|
wav_file.setsampwidth(2)
|
|
wav_file.setframerate(sample_rate)
|
|
|
|
# Write frames
|
|
for sample in samples:
|
|
wav_file.writeframes(struct.pack('<h', sample))
|
|
|
|
print(f"[OK] Generated test_audio.wav")
|
|
print(f" - Format: 16kHz, mono, 16-bit PCM")
|
|
print(f" - Duration: {duration}s")
|
|
print(f" - Frequency: {frequency}Hz (A4 tone)")
|
|
print(f" - Samples: {len(samples)}")
|