aissia/create_test_audio_simple.py
StillHammer d7971e0c34 feat: Phase 7 STT - Complete Windows setup with Whisper.cpp
Added Speech-to-Text configuration and testing infrastructure:

## STT Engines Configured
-  Whisper.cpp (local, offline) - base model downloaded (142MB)
-  OpenAI Whisper API - configured with existing API key
-  Google Speech-to-Text - configured with existing API key
- ⚠️ Azure STT - optional (not configured)
- ⚠️ Deepgram - optional (not configured)

## New Files
- `docs/STT_SETUP.md` - Complete Windows STT setup guide
- `test_stt_live.cpp` - Test tool for all 5 STT engines
- `create_test_audio_simple.py` - Generate test audio (440Hz tone, 16kHz WAV)
- `create_test_audio.py` - Generate speech audio (requires gtts)
- `models/ggml-base.bin` - Whisper.cpp base model (gitignored)
- `test_audio.wav` - Generated test audio (gitignored)

## Documentation
- Complete setup guide for all STT engines
- API key configuration instructions
- Model download links and recommendations
- Troubleshooting section
- Cost comparison for cloud APIs

## Next Steps
- Compile test_stt_live.cpp to validate all engines
- Test with real audio input
- Integrate into VoiceModule via pub/sub

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 17:12:37 +08:00

39 lines
1.0 KiB
Python

#!/usr/bin/env python3
"""Generate simple test audio WAV file using only stdlib"""
import wave
import struct
import math
# WAV parameters
sample_rate = 16000
duration = 2 # seconds
frequency = 440 # Hz (A4 note)
# Generate sine wave samples
samples = []
for i in range(int(sample_rate * duration)):
# Sine wave value (-1.0 to 1.0)
value = math.sin(2.0 * math.pi * frequency * i / sample_rate)
# Convert to 16-bit PCM (-32768 to 32767)
sample = int(value * 32767)
samples.append(sample)
# Write WAV file
with wave.open("test_audio.wav", "w") as wav_file:
# Set parameters (1 channel, 2 bytes per sample, 16kHz)
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(sample_rate)
# Write frames
for sample in samples:
wav_file.writeframes(struct.pack('<h', sample))
print(f"[OK] Generated test_audio.wav")
print(f" - Format: 16kHz, mono, 16-bit PCM")
print(f" - Duration: {duration}s")
print(f" - Frequency: {frequency}Hz (A4 tone)")
print(f" - Samples: {len(samples)}")