videotomp3transcriptor/README.md
StillHammer 849412c3bd Initial commit: Video to MP3 Transcriptor
- YouTube video/playlist download as MP3 (yt-dlp)
- Audio transcription with OpenAI (gpt-4o-transcribe, whisper-1)
- Translation with GPT-4o-mini (chunking for long texts)
- Web interface with progress bars and drag & drop
- CLI and REST API interfaces
- Linux shell scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 11:40:23 +08:00

236 lines
5.1 KiB
Markdown

# Video to MP3 Transcriptor
Download YouTube videos/playlists to MP3 and transcribe them using OpenAI Whisper API.
## Features
- Download single YouTube videos as MP3
- Download entire playlists as MP3
- Transcribe audio files using OpenAI Whisper API
- CLI interface for quick operations
- REST API for integration with other systems
## Prerequisites
- **Node.js** 18+
- **yt-dlp** installed on your system
- **ffmpeg** installed (for audio conversion)
- **OpenAI API key** (for transcription)
### Installing yt-dlp
```bash
# Windows (winget)
winget install yt-dlp
# macOS
brew install yt-dlp
# Linux
sudo apt install yt-dlp
# or
pip install yt-dlp
```
### Installing ffmpeg
```bash
# Windows (winget)
winget install ffmpeg
# macOS
brew install ffmpeg
# Linux
sudo apt install ffmpeg
```
## Installation
```bash
# Clone and install
cd videotoMP3Transcriptor
npm install
# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
```
## Usage
### CLI
```bash
# Download a video as MP3
npm run cli download "https://youtube.com/watch?v=VIDEO_ID"
# Download a playlist
npm run cli download "https://youtube.com/playlist?list=PLAYLIST_ID"
# Download with custom output directory
npm run cli download "URL" -o ./my-folder
# Get info about a video/playlist
npm run cli info "URL"
# Transcribe an existing MP3
npm run cli transcribe ./output/video.mp3
# Transcribe with specific language
npm run cli transcribe ./output/video.mp3 -l fr
# Transcribe with specific model
npm run cli transcribe ./output/video.mp3 -m gpt-4o-mini-transcribe
# Download AND transcribe
npm run cli process "URL"
# Download and transcribe with options
npm run cli process "URL" -l en -m gpt-4o-transcribe
```
### Linux Scripts
Convenience scripts are available in the `scripts/` directory:
```bash
# Make scripts executable (first time only)
chmod +x scripts/*.sh
# Download video/playlist
./scripts/download.sh "https://youtube.com/watch?v=VIDEO_ID"
# Transcribe a file
./scripts/transcribe.sh ./output/video.mp3 fr
# Download + transcribe
./scripts/process.sh "https://youtube.com/watch?v=VIDEO_ID" en
# Start the API server
./scripts/server.sh
# Get video info
./scripts/info.sh "https://youtube.com/watch?v=VIDEO_ID"
```
### API Server
```bash
# Start the server
npm run server
```
Server runs on `http://localhost:3000` by default.
#### Endpoints
##### GET /health
Health check endpoint.
##### GET /info?url=YOUTUBE_URL
Get info about a video or playlist.
```bash
curl "http://localhost:3000/info?url=https://youtube.com/watch?v=VIDEO_ID"
```
##### POST /download
Download video(s) as MP3.
```bash
curl -X POST http://localhost:3000/download \
-H "Content-Type: application/json" \
-d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}'
```
##### POST /transcribe
Transcribe an existing audio file.
```bash
curl -X POST http://localhost:3000/transcribe \
-H "Content-Type: application/json" \
-d '{"filePath": "./output/video.mp3", "language": "en"}'
```
##### POST /process
Download and transcribe in one call.
```bash
curl -X POST http://localhost:3000/process \
-H "Content-Type: application/json" \
-d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en", "format": "txt"}'
```
##### GET /files-list
List all downloaded files.
##### GET /files/:filename
Download/stream a specific file.
## Configuration
Environment variables (`.env`):
| Variable | Description | Default |
|----------|-------------|---------|
| `OPENAI_API_KEY` | Your OpenAI API key | Required for transcription |
| `PORT` | Server port | 3000 |
| `OUTPUT_DIR` | Download directory | ./output |
## Transcription Models
| Model | Description | Formats |
|-------|-------------|---------|
| `gpt-4o-transcribe` | Best quality, latest GPT-4o (default) | txt, json |
| `gpt-4o-mini-transcribe` | Faster, cheaper, good quality | txt, json |
| `whisper-1` | Legacy Whisper model | txt, json, srt, vtt |
## Transcription Formats
- `txt` - Plain text (all models)
- `json` - JSON response (all models)
- `srt` - SubRip subtitles (whisper-1 only)
- `vtt` - WebVTT subtitles (whisper-1 only)
## Language Codes
Common language codes for the `-l` option:
- `en` - English
- `fr` - French
- `es` - Spanish
- `de` - German
- `it` - Italian
- `pt` - Portuguese
- `zh` - Chinese
- `ja` - Japanese
- `ko` - Korean
- `ru` - Russian
Leave empty for auto-detection.
## Project Structure
```
videotoMP3Transcriptor/
├── src/
│ ├── services/
│ │ ├── youtube.js # YouTube download service
│ │ └── transcription.js # OpenAI transcription service
│ ├── cli.js # CLI entry point
│ └── server.js # Express API server
├── scripts/ # Linux convenience scripts
│ ├── download.sh # Download video/playlist
│ ├── transcribe.sh # Transcribe audio file
│ ├── process.sh # Download + transcribe
│ ├── server.sh # Start API server
│ └── info.sh # Get video info
├── output/ # Downloaded files
├── .env # Configuration
└── package.json
```
## License
MIT