Reorganize repository structure

- Move all Python scripts to tools/ directory - Move documentation files to docs/ directory - Create exams/ and homework/ directories for future use - Remove temporary test file (page1_preview.png) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-27 23:28:39 +08:00 · 2025-10-27 23:28:39 +08:00 · a61a32b57f
commit a61a32b57f
parent acbe1b4769
14 changed files with 229 additions and 0 deletions
--- a/docs/Projet-Outil-Apprentissage.md
+++ b/docs/Projet-Outil-Apprentissage.md
--- a/docs/README-OCR.md
+++ b/docs/README-OCR.md
--- a/docs/chinese_audio_tts_pipeline.md
+++ b/docs/chinese_audio_tts_pipeline.md
@ -0,0 +1,229 @@
 # Chinese Audio to Text Extractor - Simple Transcription
 ## Objectif
 Extraire le texte de fichiers MP3 de cours de chinois en utilisant Whisper.
 ### Problème résolu
 - Besoin de récupérer le contenu textuel des cours audio
 - Conversion MP3 → Texte simple et rapide
 ### Solution
 Pipeline minimaliste : MP3 → Whisper → Texte brut
 ---
 ## Architecture Pipeline
 ```
 ┌─────────────────────────────────────────┐
 │  INPUT: cours_chinois.mp3 (45min)       │
 └──────────────┬──────────────────────────┘
               │
               ▼
 ┌─────────────────────────────────────────┐
 │  Transcription (Whisper)                │
 │  ├─ Model: whisper-1 (OpenAI API)      │
 │  ├─ Language: zh (mandarin)            │
 │  └─ Output: transcript.txt             │
 └──────────────┬──────────────────────────┘
               │
               ▼
 ┌─────────────────────────────────────────┐
 │  OUTPUT: cours_chinois.txt              │
 │  你好。我叫Alexis。今天我们学习...      │
 └─────────────────────────────────────────┘
 ```
 ---
 ## Plan d'Implémentation Python
 ### Structure du projet
 ```
 chinese-transcriber/
 ├── transcribe.py           # Script principal
 ├── input/                  # MP3 source
 ├── output/                 # Fichiers .txt générés
 ├── .env                    # API key
 └── requirements.txt
 ```
 ### Dépendances (requirements.txt)
 ```txt
 openai>=1.0.0              # Whisper API
 python-dotenv>=1.0.0       # Env variables
 ```
 ### Script Principal (transcribe.py)
 ```python
 """
 Transcription simple MP3 → TXT avec Whisper
 """
 import openai
 from pathlib import Path
 from dotenv import load_dotenv
 import os
 def transcribe_audio(audio_path: Path, api_key: str) -> str:
    """
    Transcrit un fichier MP3 en chinois
    Args:
        audio_path: Chemin vers MP3
        api_key: Clé API OpenAI
    Returns:
        Texte transcrit
    """
    client = openai.OpenAI(api_key=api_key)
    with open(audio_path, "rb") as audio_file:
        transcript = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            language="zh",  # Force mandarin
            response_format="text"  # Texte brut
        )
    return transcript
 def main():
    # Load API key
    load_dotenv()
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        print("Error: OPENAI_API_KEY not found in .env")
        return
    # Setup paths
    input_dir = Path("input")
    output_dir = Path("output")
    output_dir.mkdir(exist_ok=True)
    # Get MP3 files
    mp3_files = list(input_dir.glob("*.mp3"))
    if not mp3_files:
        print(f"No MP3 files found in {input_dir}/")
        return
    print(f"Found {len(mp3_files)} MP3 files to transcribe\n")
    # Process each file
    for mp3_file in mp3_files:
        print(f"Processing: {mp3_file.name}...")
        try:
            # Transcribe
            text = transcribe_audio(mp3_file, api_key)
            # Save to TXT
            output_path = output_dir / f"{mp3_file.stem}.txt"
            with open(output_path, "w", encoding="utf-8") as f:
                f.write(text)
            print(f"✓ Saved to: {output_path}\n")
        except Exception as e:
            print(f"✗ Error: {e}\n")
    print("=== Transcription completed ===")
 if __name__ == "__main__":
    main()
 ```
 ---
 ### Environment Variables (.env)
 ```bash
 OPENAI_API_KEY=sk-...
 ```
 ---
 ## Estimation Coûts
 ### Pour 10 heures de cours audio
 | Service | Coût | Calcul |
 |---------|------|--------|
 | **Whisper API** | **$3.60** | 10h × $0.006/min × 60min |
 **Ultra-abordable** pour extraction simple de texte.
 ---
 ## Usage
 ### Installation
 ```bash
 mkdir chinese-transcriber
 cd chinese-transcriber
 # Créer structure
 mkdir input output
 # Installer dépendances
 pip install openai python-dotenv
 # Créer .env
 echo "OPENAI_API_KEY=sk-..." > .env
 # Copier le script transcribe.py
 ```
 ### Exécution
 ```bash
 # 1. Placer tes MP3 dans input/
 cp /path/to/cours*.mp3 input/
 # 2. Run script
 python transcribe.py
 # Output:
 # Found 3 MP3 files to transcribe
 #
 # Processing: cours_1.mp3...
 # ✓ Saved to: output/cours_1.txt
 #
 # Processing: cours_2.mp3...
 # ✓ Saved to: output/cours_2.txt
 # ...
 ```
 ### Output
 Fichiers `.txt` avec texte chinois brut :
 ```
 output/cours_1.txt:
 你好。我叫Alexis。今天我们学习汉语。
 第一课是关于问候的。你好吗？我很好，谢谢。
 ...
 ```
 ---
 ## Statut
 ✅ **PLAN SIMPLE - PRÊT À UTILISER**
 Script minimaliste pour extraction texte MP3 → TXT.
 **Next steps si besoin** :
 1. Tester sur tes fichiers MP3 chinois
 2. Si besoin découpage automatique, voir options full TTS pipeline (commenté dans versions précédentes)
 ---
 *Créé : 27 octobre 2025*
 *Stack : Python 3.10+, Whisper API seulement*
--- a/tools/debug_pdf_placement.py
+++ b/tools/debug_pdf_placement.py
--- a/tools/diagnose_alignment.py
+++ b/tools/diagnose_alignment.py
--- a/tools/download_audio_resources.py
+++ b/tools/download_audio_resources.py
--- a/tools/json_to_pdf_QUADRILATERAL.py
+++ b/tools/json_to_pdf_QUADRILATERAL.py
--- a/tools/json_to_pdf_SIMPLE.py
+++ b/tools/json_to_pdf_SIMPLE.py
--- a/tools/json_to_searchable_pdf.py
+++ b/tools/json_to_searchable_pdf.py
--- a/tools/ocr_pipeline.py
+++ b/tools/ocr_pipeline.py
--- a/tools/requirements-ocr.txt
+++ b/tools/requirements-ocr.txt
--- a/tools/test_center_scaling.py
+++ b/tools/test_center_scaling.py
--- a/tools/test_y_offset.py
+++ b/tools/test_y_offset.py
--- a/tools/test_y_scaling.py
+++ b/tools/test_y_scaling.py