Implemented complete STT (Speech-to-Text) system with 4 engines:
1. **PocketSphinxEngine** (new)
- Lightweight keyword spotting
- Perfect for passive wake word detection
- ~10MB model, very low CPU/RAM usage
- Keywords: "celuna", "hey celuna", etc.
2. **VoskSTTEngine** (existing)
- Balanced local STT for full transcription
- 50MB models, good accuracy
- Already working
3. **WhisperCppEngine** (new)
- High-quality offline STT using whisper.cpp
- 75MB-2.9GB models depending on quality
- Excellent accuracy, runs entirely local
4. **WhisperAPIEngine** (existing)
- Cloud STT via OpenAI Whisper API
- Best accuracy, requires internet + API key
- Already working
Features:
- Full JSON configuration via config/voice.json
- Auto-selection mode tries engines in order
- Dual mode support (passive + active)
- Fallback chain for reliability
- All engines use ISTTEngine interface
Updated:
- STTEngineFactory: Added support for all 4 engines
- CMakeLists.txt: Added new source files
- docs/STT_CONFIGURATION.md: Complete config guide
Config example (voice.json):
{
"passive_mode": { "engine": "pocketsphinx" },
"active_mode": { "engine": "vosk", "fallback": "whisper-api" }
}
Architecture: ISTTService → STTEngineFactory → 4 engines
Build: ✅ Compiles successfully
Status: Phase 7 complete, ready for testing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
80 lines
2.3 KiB
C++
80 lines
2.3 KiB
C++
#pragma once
|
|
|
|
#include "ISTTEngine.hpp"
|
|
#include <spdlog/spdlog.h>
|
|
#include <memory>
|
|
#include <vector>
|
|
#include <string>
|
|
|
|
// whisper.cpp forward declarations (to avoid including full headers)
|
|
struct whisper_context;
|
|
struct whisper_full_params;
|
|
|
|
namespace aissia {
|
|
|
|
/**
|
|
* @brief Whisper.cpp Speech-to-Text engine
|
|
*
|
|
* Local high-quality STT using OpenAI's Whisper model via whisper.cpp.
|
|
* Runs entirely offline with excellent accuracy.
|
|
*
|
|
* Features:
|
|
* - High accuracy (OpenAI Whisper quality)
|
|
* - Completely offline (no internet required)
|
|
* - Multiple model sizes (tiny, base, small, medium, large)
|
|
* - Multilingual support
|
|
*
|
|
* Model sizes:
|
|
* - tiny: ~75MB, fastest, less accurate
|
|
* - base: ~142MB, balanced
|
|
* - small: ~466MB, good quality
|
|
* - medium: ~1.5GB, very good
|
|
* - large: ~2.9GB, best quality
|
|
*
|
|
* Recommended: base or small for most use cases
|
|
*/
|
|
class WhisperCppEngine : public ISTTEngine {
|
|
public:
|
|
/**
|
|
* @brief Construct Whisper.cpp engine
|
|
* @param modelPath Path to Whisper GGML model file (e.g., "models/ggml-base.bin")
|
|
*/
|
|
explicit WhisperCppEngine(const std::string& modelPath);
|
|
|
|
~WhisperCppEngine() override;
|
|
|
|
// Disable copy
|
|
WhisperCppEngine(const WhisperCppEngine&) = delete;
|
|
WhisperCppEngine& operator=(const WhisperCppEngine&) = delete;
|
|
|
|
std::string transcribe(const std::vector<float>& audioData) override;
|
|
std::string transcribeFile(const std::string& filePath) override;
|
|
void setLanguage(const std::string& language) override;
|
|
bool isAvailable() const override;
|
|
std::string getEngineName() const override;
|
|
|
|
/**
|
|
* @brief Set transcription parameters
|
|
* @param threads Number of threads to use (default: 4)
|
|
* @param translate Translate to English (default: false)
|
|
*/
|
|
void setParameters(int threads = 4, bool translate = false);
|
|
|
|
private:
|
|
bool initialize();
|
|
void cleanup();
|
|
std::string processAudioData(const float* audioData, size_t numSamples);
|
|
|
|
std::shared_ptr<spdlog::logger> m_logger;
|
|
std::string m_modelPath;
|
|
std::string m_language = "auto";
|
|
bool m_available = false;
|
|
int m_threads = 4;
|
|
bool m_translate = false;
|
|
|
|
// whisper.cpp context (opaque pointer to avoid header dependency)
|
|
whisper_context* m_ctx = nullptr;
|
|
};
|
|
|
|
} // namespace aissia
|