Go to file
debian.StillHammer 4bb8b85c0e Fix dynamic YouTube cookies detection
Allow cookies uploaded after server start to be detected by checking
process.env.YOUTUBE_COOKIES_PATH dynamically instead of relying only
on the cached COOKIES_PATH constant.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-09 09:24:57 +00:00
.claude Migration Gitea 2025-12-04 20:57:51 +08:00
docs Update API documentation port from 8888 to 3001 2025-12-08 12:13:54 +00:00
public Add YouTube cookies management system with enhanced error messages 2025-12-08 10:42:53 +00:00
scripts Add YouTube cookies support to bypass bot detection 2025-12-05 13:32:35 +00:00
src Fix dynamic YouTube cookies detection 2025-12-09 09:24:57 +00:00
.env.example Add YouTube cookies support to bypass bot detection 2025-12-05 13:32:35 +00:00
.gitignore Add YouTube cookies support to bypass bot detection 2025-12-05 13:32:35 +00:00
CLAUDE.md Migration Gitea 2025-12-04 20:57:51 +08:00
COOKIES_QUICK_START.md Add YouTube cookies support to bypass bot detection 2025-12-05 13:32:35 +00:00
extract-and-upload-cookies.sh Add public download endpoint without authentication 2025-12-08 11:32:23 +00:00
FILE_SHARING.md Add YouTube cookies support to bypass bot detection 2025-12-05 13:32:35 +00:00
package-lock.json Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00
package.json Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00
README.md Initial commit: Video to MP3 Transcriptor 2025-11-24 11:40:23 +08:00
refresh-cookies.sh Add public download endpoint without authentication 2025-12-08 11:32:23 +00:00
start-server.bat Migration Gitea 2025-12-04 20:57:51 +08:00
start-server.sh Migration Gitea 2025-12-04 20:57:51 +08:00
TEST_LOCAL.md Add API security with token authentication and localStorage management 2025-12-05 12:01:19 +08:00

Video to MP3 Transcriptor

Download YouTube videos/playlists to MP3 and transcribe them using OpenAI Whisper API.

Features

  • Download single YouTube videos as MP3
  • Download entire playlists as MP3
  • Transcribe audio files using OpenAI Whisper API
  • CLI interface for quick operations
  • REST API for integration with other systems

Prerequisites

  • Node.js 18+
  • yt-dlp installed on your system
  • ffmpeg installed (for audio conversion)
  • OpenAI API key (for transcription)

Installing yt-dlp

# Windows (winget)
winget install yt-dlp

# macOS
brew install yt-dlp

# Linux
sudo apt install yt-dlp
# or
pip install yt-dlp

Installing ffmpeg

# Windows (winget)
winget install ffmpeg

# macOS
brew install ffmpeg

# Linux
sudo apt install ffmpeg

Installation

# Clone and install
cd videotoMP3Transcriptor
npm install

# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Usage

CLI

# Download a video as MP3
npm run cli download "https://youtube.com/watch?v=VIDEO_ID"

# Download a playlist
npm run cli download "https://youtube.com/playlist?list=PLAYLIST_ID"

# Download with custom output directory
npm run cli download "URL" -o ./my-folder

# Get info about a video/playlist
npm run cli info "URL"

# Transcribe an existing MP3
npm run cli transcribe ./output/video.mp3

# Transcribe with specific language
npm run cli transcribe ./output/video.mp3 -l fr

# Transcribe with specific model
npm run cli transcribe ./output/video.mp3 -m gpt-4o-mini-transcribe

# Download AND transcribe
npm run cli process "URL"

# Download and transcribe with options
npm run cli process "URL" -l en -m gpt-4o-transcribe

Linux Scripts

Convenience scripts are available in the scripts/ directory:

# Make scripts executable (first time only)
chmod +x scripts/*.sh

# Download video/playlist
./scripts/download.sh "https://youtube.com/watch?v=VIDEO_ID"

# Transcribe a file
./scripts/transcribe.sh ./output/video.mp3 fr

# Download + transcribe
./scripts/process.sh "https://youtube.com/watch?v=VIDEO_ID" en

# Start the API server
./scripts/server.sh

# Get video info
./scripts/info.sh "https://youtube.com/watch?v=VIDEO_ID"

API Server

# Start the server
npm run server

Server runs on http://localhost:3000 by default.

Endpoints

GET /health

Health check endpoint.

GET /info?url=YOUTUBE_URL

Get info about a video or playlist.

curl "http://localhost:3000/info?url=https://youtube.com/watch?v=VIDEO_ID"
POST /download

Download video(s) as MP3.

curl -X POST http://localhost:3000/download \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}'
POST /transcribe

Transcribe an existing audio file.

curl -X POST http://localhost:3000/transcribe \
  -H "Content-Type: application/json" \
  -d '{"filePath": "./output/video.mp3", "language": "en"}'
POST /process

Download and transcribe in one call.

curl -X POST http://localhost:3000/process \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en", "format": "txt"}'
GET /files-list

List all downloaded files.

GET /files/:filename

Download/stream a specific file.

Configuration

Environment variables (.env):

Variable Description Default
OPENAI_API_KEY Your OpenAI API key Required for transcription
PORT Server port 3000
OUTPUT_DIR Download directory ./output

Transcription Models

Model Description Formats
gpt-4o-transcribe Best quality, latest GPT-4o (default) txt, json
gpt-4o-mini-transcribe Faster, cheaper, good quality txt, json
whisper-1 Legacy Whisper model txt, json, srt, vtt

Transcription Formats

  • txt - Plain text (all models)
  • json - JSON response (all models)
  • srt - SubRip subtitles (whisper-1 only)
  • vtt - WebVTT subtitles (whisper-1 only)

Language Codes

Common language codes for the -l option:

  • en - English
  • fr - French
  • es - Spanish
  • de - German
  • it - Italian
  • pt - Portuguese
  • zh - Chinese
  • ja - Japanese
  • ko - Korean
  • ru - Russian

Leave empty for auto-detection.

Project Structure

videotoMP3Transcriptor/
├── src/
│   ├── services/
│   │   ├── youtube.js       # YouTube download service
│   │   └── transcription.js # OpenAI transcription service
│   ├── cli.js               # CLI entry point
│   └── server.js            # Express API server
├── scripts/                  # Linux convenience scripts
│   ├── download.sh          # Download video/playlist
│   ├── transcribe.sh        # Transcribe audio file
│   ├── process.sh           # Download + transcribe
│   ├── server.sh            # Start API server
│   └── info.sh              # Get video info
├── output/                   # Downloaded files
├── .env                      # Configuration
└── package.json

License

MIT