# Video to MP3 Transcriptor Download YouTube videos/playlists to MP3 and transcribe them using OpenAI Whisper API. ## Features - Download single YouTube videos as MP3 - Download entire playlists as MP3 - Transcribe audio files using OpenAI Whisper API - CLI interface for quick operations - REST API for integration with other systems ## Prerequisites - **Node.js** 18+ - **yt-dlp** installed on your system - **ffmpeg** installed (for audio conversion) - **OpenAI API key** (for transcription) ### Installing yt-dlp ```bash # Windows (winget) winget install yt-dlp # macOS brew install yt-dlp # Linux sudo apt install yt-dlp # or pip install yt-dlp ``` ### Installing ffmpeg ```bash # Windows (winget) winget install ffmpeg # macOS brew install ffmpeg # Linux sudo apt install ffmpeg ``` ## Installation ```bash # Clone and install cd videotoMP3Transcriptor npm install # Configure environment cp .env.example .env # Edit .env and add your OPENAI_API_KEY ``` ## Usage ### CLI ```bash # Download a video as MP3 npm run cli download "https://youtube.com/watch?v=VIDEO_ID" # Download a playlist npm run cli download "https://youtube.com/playlist?list=PLAYLIST_ID" # Download with custom output directory npm run cli download "URL" -o ./my-folder # Get info about a video/playlist npm run cli info "URL" # Transcribe an existing MP3 npm run cli transcribe ./output/video.mp3 # Transcribe with specific language npm run cli transcribe ./output/video.mp3 -l fr # Transcribe with specific model npm run cli transcribe ./output/video.mp3 -m gpt-4o-mini-transcribe # Download AND transcribe npm run cli process "URL" # Download and transcribe with options npm run cli process "URL" -l en -m gpt-4o-transcribe ``` ### Linux Scripts Convenience scripts are available in the `scripts/` directory: ```bash # Make scripts executable (first time only) chmod +x scripts/*.sh # Download video/playlist ./scripts/download.sh "https://youtube.com/watch?v=VIDEO_ID" # Transcribe a file ./scripts/transcribe.sh ./output/video.mp3 fr # Download + transcribe ./scripts/process.sh "https://youtube.com/watch?v=VIDEO_ID" en # Start the API server ./scripts/server.sh # Get video info ./scripts/info.sh "https://youtube.com/watch?v=VIDEO_ID" ``` ### API Server ```bash # Start the server npm run server ``` Server runs on `http://localhost:3000` by default. #### Endpoints ##### GET /health Health check endpoint. ##### GET /info?url=YOUTUBE_URL Get info about a video or playlist. ```bash curl "http://localhost:3000/info?url=https://youtube.com/watch?v=VIDEO_ID" ``` ##### POST /download Download video(s) as MP3. ```bash curl -X POST http://localhost:3000/download \ -H "Content-Type: application/json" \ -d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}' ``` ##### POST /transcribe Transcribe an existing audio file. ```bash curl -X POST http://localhost:3000/transcribe \ -H "Content-Type: application/json" \ -d '{"filePath": "./output/video.mp3", "language": "en"}' ``` ##### POST /process Download and transcribe in one call. ```bash curl -X POST http://localhost:3000/process \ -H "Content-Type: application/json" \ -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en", "format": "txt"}' ``` ##### GET /files-list List all downloaded files. ##### GET /files/:filename Download/stream a specific file. ## Configuration Environment variables (`.env`): | Variable | Description | Default | |----------|-------------|---------| | `OPENAI_API_KEY` | Your OpenAI API key | Required for transcription | | `PORT` | Server port | 3000 | | `OUTPUT_DIR` | Download directory | ./output | ## Transcription Models | Model | Description | Formats | |-------|-------------|---------| | `gpt-4o-transcribe` | Best quality, latest GPT-4o (default) | txt, json | | `gpt-4o-mini-transcribe` | Faster, cheaper, good quality | txt, json | | `whisper-1` | Legacy Whisper model | txt, json, srt, vtt | ## Transcription Formats - `txt` - Plain text (all models) - `json` - JSON response (all models) - `srt` - SubRip subtitles (whisper-1 only) - `vtt` - WebVTT subtitles (whisper-1 only) ## Language Codes Common language codes for the `-l` option: - `en` - English - `fr` - French - `es` - Spanish - `de` - German - `it` - Italian - `pt` - Portuguese - `zh` - Chinese - `ja` - Japanese - `ko` - Korean - `ru` - Russian Leave empty for auto-detection. ## Project Structure ``` videotoMP3Transcriptor/ ├── src/ │ ├── services/ │ │ ├── youtube.js # YouTube download service │ │ └── transcription.js # OpenAI transcription service │ ├── cli.js # CLI entry point │ └── server.js # Express API server ├── scripts/ # Linux convenience scripts │ ├── download.sh # Download video/playlist │ ├── transcribe.sh # Transcribe audio file │ ├── process.sh # Download + transcribe │ ├── server.sh # Start API server │ └── info.sh # Get video info ├── output/ # Downloaded files ├── .env # Configuration └── package.json ``` ## License MIT