Implement drag-and-drop interface for converting video and audio files to MP3 format using FFmpeg. Users can now upload files (MP4, M4A, AVI, MKV, MOV, WAV, FLAC, OGG) and convert them with customizable bitrate and quality settings. - Add conversion service with FFmpeg integration - Add /convert-to-mp3 and /supported-formats API endpoints - Add new "Video to MP3" tab with drag-and-drop UI - Support multiple file uploads with batch conversion - Add bitrate (128k-320k) and VBR quality options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
|---|---|---|
| public | ||
| scripts | ||
| src | ||
| .env.example | ||
| .gitignore | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
Video to MP3 Transcriptor
Download YouTube videos/playlists to MP3 and transcribe them using OpenAI Whisper API.
Features
- Download single YouTube videos as MP3
- Download entire playlists as MP3
- Transcribe audio files using OpenAI Whisper API
- CLI interface for quick operations
- REST API for integration with other systems
Prerequisites
- Node.js 18+
- yt-dlp installed on your system
- ffmpeg installed (for audio conversion)
- OpenAI API key (for transcription)
Installing yt-dlp
# Windows (winget)
winget install yt-dlp
# macOS
brew install yt-dlp
# Linux
sudo apt install yt-dlp
# or
pip install yt-dlp
Installing ffmpeg
# Windows (winget)
winget install ffmpeg
# macOS
brew install ffmpeg
# Linux
sudo apt install ffmpeg
Installation
# Clone and install
cd videotoMP3Transcriptor
npm install
# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
Usage
CLI
# Download a video as MP3
npm run cli download "https://youtube.com/watch?v=VIDEO_ID"
# Download a playlist
npm run cli download "https://youtube.com/playlist?list=PLAYLIST_ID"
# Download with custom output directory
npm run cli download "URL" -o ./my-folder
# Get info about a video/playlist
npm run cli info "URL"
# Transcribe an existing MP3
npm run cli transcribe ./output/video.mp3
# Transcribe with specific language
npm run cli transcribe ./output/video.mp3 -l fr
# Transcribe with specific model
npm run cli transcribe ./output/video.mp3 -m gpt-4o-mini-transcribe
# Download AND transcribe
npm run cli process "URL"
# Download and transcribe with options
npm run cli process "URL" -l en -m gpt-4o-transcribe
Linux Scripts
Convenience scripts are available in the scripts/ directory:
# Make scripts executable (first time only)
chmod +x scripts/*.sh
# Download video/playlist
./scripts/download.sh "https://youtube.com/watch?v=VIDEO_ID"
# Transcribe a file
./scripts/transcribe.sh ./output/video.mp3 fr
# Download + transcribe
./scripts/process.sh "https://youtube.com/watch?v=VIDEO_ID" en
# Start the API server
./scripts/server.sh
# Get video info
./scripts/info.sh "https://youtube.com/watch?v=VIDEO_ID"
API Server
# Start the server
npm run server
Server runs on http://localhost:3000 by default.
Endpoints
GET /health
Health check endpoint.
GET /info?url=YOUTUBE_URL
Get info about a video or playlist.
curl "http://localhost:3000/info?url=https://youtube.com/watch?v=VIDEO_ID"
POST /download
Download video(s) as MP3.
curl -X POST http://localhost:3000/download \
-H "Content-Type: application/json" \
-d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}'
POST /transcribe
Transcribe an existing audio file.
curl -X POST http://localhost:3000/transcribe \
-H "Content-Type: application/json" \
-d '{"filePath": "./output/video.mp3", "language": "en"}'
POST /process
Download and transcribe in one call.
curl -X POST http://localhost:3000/process \
-H "Content-Type: application/json" \
-d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en", "format": "txt"}'
GET /files-list
List all downloaded files.
GET /files/:filename
Download/stream a specific file.
Configuration
Environment variables (.env):
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | Required for transcription |
PORT |
Server port | 3000 |
OUTPUT_DIR |
Download directory | ./output |
Transcription Models
| Model | Description | Formats |
|---|---|---|
gpt-4o-transcribe |
Best quality, latest GPT-4o (default) | txt, json |
gpt-4o-mini-transcribe |
Faster, cheaper, good quality | txt, json |
whisper-1 |
Legacy Whisper model | txt, json, srt, vtt |
Transcription Formats
txt- Plain text (all models)json- JSON response (all models)srt- SubRip subtitles (whisper-1 only)vtt- WebVTT subtitles (whisper-1 only)
Language Codes
Common language codes for the -l option:
en- Englishfr- Frenches- Spanishde- Germanit- Italianpt- Portuguesezh- Chineseja- Japaneseko- Koreanru- Russian
Leave empty for auto-detection.
Project Structure
videotoMP3Transcriptor/
├── src/
│ ├── services/
│ │ ├── youtube.js # YouTube download service
│ │ └── transcription.js # OpenAI transcription service
│ ├── cli.js # CLI entry point
│ └── server.js # Express API server
├── scripts/ # Linux convenience scripts
│ ├── download.sh # Download video/playlist
│ ├── transcribe.sh # Transcribe audio file
│ ├── process.sh # Download + transcribe
│ ├── server.sh # Start API server
│ └── info.sh # Get video info
├── output/ # Downloaded files
├── .env # Configuration
└── package.json
License
MIT