Go to file
Trouve Alexis 23bb4cd2d9 Add video/audio to MP3 conversion feature
Implement drag-and-drop interface for converting video and audio files to MP3 format using FFmpeg. Users can now upload files (MP4, M4A, AVI, MKV, MOV, WAV, FLAC, OGG) and convert them with customizable bitrate and quality settings.

- Add conversion service with FFmpeg integration
- Add /convert-to-mp3 and /supported-formats API endpoints
- Add new "Video to MP3" tab with drag-and-drop UI
- Support multiple file uploads with batch conversion
- Add bitrate (128k-320k) and VBR quality options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 22:56:58 +08:00
public Add video/audio to MP3 conversion feature 2025-11-29 22:56:58 +08:00
scripts Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00
src Add video/audio to MP3 conversion feature 2025-11-29 22:56:58 +08:00
.env.example Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00
.gitignore Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00
package-lock.json Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00
package.json Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00
README.md Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00

Video to MP3 Transcriptor

Download YouTube videos/playlists to MP3 and transcribe them using OpenAI Whisper API.

Features

  • Download single YouTube videos as MP3
  • Download entire playlists as MP3
  • Transcribe audio files using OpenAI Whisper API
  • CLI interface for quick operations
  • REST API for integration with other systems

Prerequisites

  • Node.js 18+
  • yt-dlp installed on your system
  • ffmpeg installed (for audio conversion)
  • OpenAI API key (for transcription)

Installing yt-dlp

# Windows (winget)
winget install yt-dlp

# macOS
brew install yt-dlp

# Linux
sudo apt install yt-dlp
# or
pip install yt-dlp

Installing ffmpeg

# Windows (winget)
winget install ffmpeg

# macOS
brew install ffmpeg

# Linux
sudo apt install ffmpeg

Installation

# Clone and install
cd videotoMP3Transcriptor
npm install

# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Usage

CLI

# Download a video as MP3
npm run cli download "https://youtube.com/watch?v=VIDEO_ID"

# Download a playlist
npm run cli download "https://youtube.com/playlist?list=PLAYLIST_ID"

# Download with custom output directory
npm run cli download "URL" -o ./my-folder

# Get info about a video/playlist
npm run cli info "URL"

# Transcribe an existing MP3
npm run cli transcribe ./output/video.mp3

# Transcribe with specific language
npm run cli transcribe ./output/video.mp3 -l fr

# Transcribe with specific model
npm run cli transcribe ./output/video.mp3 -m gpt-4o-mini-transcribe

# Download AND transcribe
npm run cli process "URL"

# Download and transcribe with options
npm run cli process "URL" -l en -m gpt-4o-transcribe

Linux Scripts

Convenience scripts are available in the scripts/ directory:

# Make scripts executable (first time only)
chmod +x scripts/*.sh

# Download video/playlist
./scripts/download.sh "https://youtube.com/watch?v=VIDEO_ID"

# Transcribe a file
./scripts/transcribe.sh ./output/video.mp3 fr

# Download + transcribe
./scripts/process.sh "https://youtube.com/watch?v=VIDEO_ID" en

# Start the API server
./scripts/server.sh

# Get video info
./scripts/info.sh "https://youtube.com/watch?v=VIDEO_ID"

API Server

# Start the server
npm run server

Server runs on http://localhost:3000 by default.

Endpoints

GET /health

Health check endpoint.

GET /info?url=YOUTUBE_URL

Get info about a video or playlist.

curl "http://localhost:3000/info?url=https://youtube.com/watch?v=VIDEO_ID"
POST /download

Download video(s) as MP3.

curl -X POST http://localhost:3000/download \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}'
POST /transcribe

Transcribe an existing audio file.

curl -X POST http://localhost:3000/transcribe \
  -H "Content-Type: application/json" \
  -d '{"filePath": "./output/video.mp3", "language": "en"}'
POST /process

Download and transcribe in one call.

curl -X POST http://localhost:3000/process \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en", "format": "txt"}'
GET /files-list

List all downloaded files.

GET /files/:filename

Download/stream a specific file.

Configuration

Environment variables (.env):

Variable Description Default
OPENAI_API_KEY Your OpenAI API key Required for transcription
PORT Server port 3000
OUTPUT_DIR Download directory ./output

Transcription Models

Model Description Formats
gpt-4o-transcribe Best quality, latest GPT-4o (default) txt, json
gpt-4o-mini-transcribe Faster, cheaper, good quality txt, json
whisper-1 Legacy Whisper model txt, json, srt, vtt

Transcription Formats

  • txt - Plain text (all models)
  • json - JSON response (all models)
  • srt - SubRip subtitles (whisper-1 only)
  • vtt - WebVTT subtitles (whisper-1 only)

Language Codes

Common language codes for the -l option:

  • en - English
  • fr - French
  • es - Spanish
  • de - German
  • it - Italian
  • pt - Portuguese
  • zh - Chinese
  • ja - Japanese
  • ko - Korean
  • ru - Russian

Leave empty for auto-detection.

Project Structure

videotoMP3Transcriptor/
├── src/
│   ├── services/
│   │   ├── youtube.js       # YouTube download service
│   │   └── transcription.js # OpenAI transcription service
│   ├── cli.js               # CLI entry point
│   └── server.js            # Express API server
├── scripts/                  # Linux convenience scripts
│   ├── download.sh          # Download video/playlist
│   ├── transcribe.sh        # Transcribe audio file
│   ├── process.sh           # Download + transcribe
│   ├── server.sh            # Start API server
│   └── info.sh              # Get video info
├── output/                   # Downloaded files
├── .env                      # Configuration
└── package.json

License

MIT