couple-repo/Projects/PAUSE/chinese_audio_tts_pipeline.md

# Chinese Audio to Text Extractor - Simple Transcription

## Objectif

Extraire le texte de fichiers MP3 de cours de chinois en utilisant Whisper.

### Problème résolu
- Besoin de récupérer le contenu textuel des cours audio
- Conversion MP3 → Texte simple et rapide

### Solution
Pipeline minimaliste : MP3 → Whisper → Texte brut

---

## Architecture Pipeline

```
┌─────────────────────────────────────────┐
│  INPUT: cours_chinois.mp3 (45min)       │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│  Transcription (Whisper)                │
│  ├─ Model: whisper-1 (OpenAI API)      │
│  ├─ Language: zh (mandarin)            │
│  └─ Output: transcript.txt             │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│  OUTPUT: cours_chinois.txt              │
│  你好。我叫Alexis。今天我们学习...      │
└─────────────────────────────────────────┘
```

---

## Plan d'Implémentation Python

### Structure du projet

```
chinese-transcriber/
├── transcribe.py           # Script principal
├── input/                  # MP3 source
├── output/                 # Fichiers .txt générés
├── .env                    # API key
└── requirements.txt
```

### Dépendances (requirements.txt)

```txt
openai>=1.0.0              # Whisper API
python-dotenv>=1.0.0       # Env variables
```

### Script Principal (transcribe.py)

```python
"""
Transcription simple MP3 → TXT avec Whisper
"""
import openai
from pathlib import Path
from dotenv import load_dotenv
import os

def transcribe_audio(audio_path: Path, api_key: str) -> str:
    """
    Transcrit un fichier MP3 en chinois

    Args:
        audio_path: Chemin vers MP3
        api_key: Clé API OpenAI

    Returns:
        Texte transcrit
    """
    client = openai.OpenAI(api_key=api_key)

    with open(audio_path, "rb") as audio_file:
        transcript = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            language="zh",  # Force mandarin
            response_format="text"  # Texte brut
        )

    return transcript

def main():
    # Load API key
    load_dotenv()
    api_key = os.getenv("OPENAI_API_KEY")

    if not api_key:
        print("Error: OPENAI_API_KEY not found in .env")
        return

    # Setup paths
    input_dir = Path("input")
    output_dir = Path("output")
    output_dir.mkdir(exist_ok=True)

    # Get MP3 files
    mp3_files = list(input_dir.glob("*.mp3"))

    if not mp3_files:
        print(f"No MP3 files found in {input_dir}/")
        return

    print(f"Found {len(mp3_files)} MP3 files to transcribe\n")

    # Process each file
    for mp3_file in mp3_files:
        print(f"Processing: {mp3_file.name}...")

        try:
            # Transcribe
            text = transcribe_audio(mp3_file, api_key)

            # Save to TXT
            output_path = output_dir / f"{mp3_file.stem}.txt"
            with open(output_path, "w", encoding="utf-8") as f:
                f.write(text)

            print(f"✓ Saved to: {output_path}\n")

        except Exception as e:
            print(f"✗ Error: {e}\n")

    print("=== Transcription completed ===")

if __name__ == "__main__":
    main()
```

---

### Environment Variables (.env)

```bash
OPENAI_API_KEY=sk-...
```

---

## Estimation Coûts

### Pour 10 heures de cours audio

| Service | Coût | Calcul |
|---------|------|--------|
| **Whisper API** | **$3.60** | 10h × $0.006/min × 60min |

**Ultra-abordable** pour extraction simple de texte.

---

## Usage

### Installation

```bash
mkdir chinese-transcriber
cd chinese-transcriber

# Créer structure
mkdir input output

# Installer dépendances
pip install openai python-dotenv

# Créer .env
echo "OPENAI_API_KEY=sk-..." > .env

# Copier le script transcribe.py
```

### Exécution

```bash
# 1. Placer tes MP3 dans input/
cp /path/to/cours*.mp3 input/

# 2. Run script
python transcribe.py

# Output:
# Found 3 MP3 files to transcribe
#
# Processing: cours_1.mp3...
# ✓ Saved to: output/cours_1.txt
#
# Processing: cours_2.mp3...
# ✓ Saved to: output/cours_2.txt
# ...
```

### Output

Fichiers `.txt` avec texte chinois brut :

```
output/cours_1.txt:
你好。我叫Alexis。今天我们学习汉语。
第一课是关于问候的。你好吗？我很好，谢谢。
...
```

---

## Statut

✅ **PLAN SIMPLE - PRÊT À UTILISER**

Script minimaliste pour extraction texte MP3 → TXT.

**Next steps si besoin** :
1. Tester sur tes fichiers MP3 chinois
2. Si besoin découpage automatique, voir options full TTS pipeline (commenté dans versions précédentes)

---

*Créé : 27 octobre 2025*
*Stack : Python 3.10+, Whisper API seulement*