couple-repo/Projects/PAUSE/chinese_audio_tts_pipeline.md
StillHammer 7425f4af2e Reorganize Projects structure by status + update tracking files
## Projects Organization
- Create status-based folders: WIP/PAUSE/CONSTANT/CONCEPT/ARCHIVE
- Move 17 projects to appropriate status folders
- Delete obsolete README.md

### WIP (4 projects)
- GroveEngine, SEO_Article_Generator, AISSIA, SecondVoice

### PAUSE (6 projects)
- Warfactory, chinese_audio_tts_pipeline, MCP_Game_Asset_Pipeline
- ocr_pdf_service, Essay_Writing_Tingting, shipping_strategy/

### CONSTANT (3 projects)
- ClassGen (Analysis + 2.0), Database_Cours_Chinois, civjdr

### CONCEPT (5 projects)
- pokrovsk_last_day, pokrovsk_drone_command (NEW full design doc)
- social_network_manager, vps_tunnel_china, Claude_Workflow_Optimization

### ARCHIVE (3 items)
- MCP_Creative_Amplification, Backlog_9-10_Octobre_2025, LeBonCoup/

## Tracking Files Updated
- Status_Projets.md: Complete rewrite with current state (Nov 2025)
- planning/TODO_data.md: Updated with new structure and all projects by status
- CLAUDE.md: Updated relation status, Projects section, daily check stats

## Daily Check System
- Add card ACTION-008: essay_writing_tingting
- Update card_database.md: 21 total cards (15 Tingting, 3 Personal, 1 Family, 1 Tech, 1 Comm)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:25:53 +08:00

230 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Chinese Audio to Text Extractor - Simple Transcription
## Objectif
Extraire le texte de fichiers MP3 de cours de chinois en utilisant Whisper.
### Problème résolu
- Besoin de récupérer le contenu textuel des cours audio
- Conversion MP3 → Texte simple et rapide
### Solution
Pipeline minimaliste : MP3 → Whisper → Texte brut
---
## Architecture Pipeline
```
┌─────────────────────────────────────────┐
│ INPUT: cours_chinois.mp3 (45min) │
└──────────────┬──────────────────────────┘
┌─────────────────────────────────────────┐
│ Transcription (Whisper) │
│ ├─ Model: whisper-1 (OpenAI API) │
│ ├─ Language: zh (mandarin) │
│ └─ Output: transcript.txt │
└──────────────┬──────────────────────────┘
┌─────────────────────────────────────────┐
│ OUTPUT: cours_chinois.txt │
│ 你好。我叫Alexis。今天我们学习... │
└─────────────────────────────────────────┘
```
---
## Plan d'Implémentation Python
### Structure du projet
```
chinese-transcriber/
├── transcribe.py # Script principal
├── input/ # MP3 source
├── output/ # Fichiers .txt générés
├── .env # API key
└── requirements.txt
```
### Dépendances (requirements.txt)
```txt
openai>=1.0.0 # Whisper API
python-dotenv>=1.0.0 # Env variables
```
### Script Principal (transcribe.py)
```python
"""
Transcription simple MP3 → TXT avec Whisper
"""
import openai
from pathlib import Path
from dotenv import load_dotenv
import os
def transcribe_audio(audio_path: Path, api_key: str) -> str:
"""
Transcrit un fichier MP3 en chinois
Args:
audio_path: Chemin vers MP3
api_key: Clé API OpenAI
Returns:
Texte transcrit
"""
client = openai.OpenAI(api_key=api_key)
with open(audio_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language="zh", # Force mandarin
response_format="text" # Texte brut
)
return transcript
def main():
# Load API key
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
print("Error: OPENAI_API_KEY not found in .env")
return
# Setup paths
input_dir = Path("input")
output_dir = Path("output")
output_dir.mkdir(exist_ok=True)
# Get MP3 files
mp3_files = list(input_dir.glob("*.mp3"))
if not mp3_files:
print(f"No MP3 files found in {input_dir}/")
return
print(f"Found {len(mp3_files)} MP3 files to transcribe\n")
# Process each file
for mp3_file in mp3_files:
print(f"Processing: {mp3_file.name}...")
try:
# Transcribe
text = transcribe_audio(mp3_file, api_key)
# Save to TXT
output_path = output_dir / f"{mp3_file.stem}.txt"
with open(output_path, "w", encoding="utf-8") as f:
f.write(text)
print(f"✓ Saved to: {output_path}\n")
except Exception as e:
print(f"✗ Error: {e}\n")
print("=== Transcription completed ===")
if __name__ == "__main__":
main()
```
---
### Environment Variables (.env)
```bash
OPENAI_API_KEY=sk-...
```
---
## Estimation Coûts
### Pour 10 heures de cours audio
| Service | Coût | Calcul |
|---------|------|--------|
| **Whisper API** | **$3.60** | 10h × $0.006/min × 60min |
**Ultra-abordable** pour extraction simple de texte.
---
## Usage
### Installation
```bash
mkdir chinese-transcriber
cd chinese-transcriber
# Créer structure
mkdir input output
# Installer dépendances
pip install openai python-dotenv
# Créer .env
echo "OPENAI_API_KEY=sk-..." > .env
# Copier le script transcribe.py
```
### Exécution
```bash
# 1. Placer tes MP3 dans input/
cp /path/to/cours*.mp3 input/
# 2. Run script
python transcribe.py
# Output:
# Found 3 MP3 files to transcribe
#
# Processing: cours_1.mp3...
# ✓ Saved to: output/cours_1.txt
#
# Processing: cours_2.mp3...
# ✓ Saved to: output/cours_2.txt
# ...
```
### Output
Fichiers `.txt` avec texte chinois brut :
```
output/cours_1.txt:
你好。我叫Alexis。今天我们学习汉语。
第一课是关于问候的。你好吗?我很好,谢谢。
...
```
---
## Statut
**PLAN SIMPLE - PRÊT À UTILISER**
Script minimaliste pour extraction texte MP3 → TXT.
**Next steps si besoin** :
1. Tester sur tes fichiers MP3 chinois
2. Si besoin découpage automatique, voir options full TTS pipeline (commenté dans versions précédentes)
---
*Créé : 27 octobre 2025*
*Stack : Python 3.10+, Whisper API seulement*