| name | video-url-transcriber |
| description | Transcribe a video/audio URL into timestamped JSON using yt-dlp + ffmpeg + faster-whisper. Use when an agent needs platform-agnostic URL-to-transcript ingestion for downstream analysis. |
Video URL Transcriber Skill
Use this skill when a user asks for transcript extraction from a public media URL (YouTube, X, and any site supported by yt-dlp).
Preconditions
Install system deps:
brew install ffmpeg yt-dlp
Install Python deps from repo root:
pip install -r requirements.txt
Minimal workflow
- Run dependency check:
python3 skill/video-url-transcriber/scripts/transcribe_url.py --doctor
- Run one-shot URL transcription:
python3 skill/video-url-transcriber/scripts/transcribe_url.py "https://www.youtube.com/watch?v=..." --model-size small
- Return JSON output with:
source_url
platform
title
duration_sec
language
transcript
segments[] with timestamps and optional words
API mode (agent workers)
Start API service:
python3 skill/video-url-transcriber/scripts/run_api.py
Transcribe request:
curl -s http://127.0.0.1:8099/transcribe \
-H 'content-type: application/json' \
-d '{
"url": "https://www.youtube.com/watch?v=...",
"language": null,
"model_size": "small",
"word_timestamps": true,
"persist_media": false
}'
Optional auth-gated media
Default policy: do not use browser cookies for public URLs.
If and only if the user explicitly approves personal browser/session usage, pass browser cookies with explicit opt-in:
python3 skill/video-url-transcriber/scripts/transcribe_url.py "<url>" --cookies-from-browser chrome --allow-personal-cookies
Execution rules
- Always normalize audio to mono 16k WAV before ASR.
- Never use browser cookies/keychain data unless user explicitly asks for it.
- Default
model_size is small; raise to medium/large-v3 only when user asks for higher quality.
- Prefer
persist_media=false unless user explicitly wants artifacts retained.
- If transcription fails, return exact failing stage:
fetch, ffmpeg normalize, or asr.