| name | youtube-content |
| description | Extract YouTube transcripts and transform into structured content |
| alwaysInclude | false |
| requiredBins | ["python3"] |
| platforms | ["darwin","linux"] |
| version | 1.0.0 |
| tenantIds | [] |
YouTube Content
Extract transcripts from YouTube videos and transform them into structured content formats: summaries, blog posts, show notes, and study guides.
Transcript Extraction
Using yt-dlp
pip install yt-dlp
yt-dlp --write-auto-sub --sub-lang en --skip-download -o "%(id)s" "VIDEO_URL"
yt-dlp --write-sub --sub-lang en --skip-download -o "%(id)s" "VIDEO_URL"
yt-dlp --list-subs "VIDEO_URL"
yt-dlp --dump-json --skip-download "VIDEO_URL"
Using youtube-transcript-api (Python)
pip install youtube-transcript-api
from youtube_transcript_api import YouTubeTranscriptApi
transcript = YouTubeTranscriptApi.get_transcript("VIDEO_ID")
transcript = YouTubeTranscriptApi.get_transcript("VIDEO_ID", languages=["en"])
text = " ".join([entry["text"] for entry in transcript])
for entry in transcript:
mins = int(entry["start"]) // 60
secs = int(entry["start"]) % 60
print(f"[{mins:02d}:{secs:02d}] {entry['text']}")
Content Transformation
Summary Generation
Given a transcript, produce:
- Title and metadata — video title, channel, duration, publish date.
- Executive summary — 2-3 sentence overview.
- Key points — bulleted list of main ideas with timestamps.
- Notable quotes — verbatim quotes worth highlighting.
- Action items — if the video contains advice or instructions.
Blog Post Format
# [Video Title]
*Based on [Channel Name]'s video ([link])*
## Overview
[2-3 paragraph summary]
## Key Takeaways
### 1. [First major point]
[Expanded discussion with timestamp reference]
### 2. [Second major point]
...
## Conclusion
[Wrap-up and implications]
Show Notes Format
## Show Notes: [Video Title]
**Channel:** [Name]
**Duration:** [HH:MM:SS]
**Published:** [Date]
### Timestamps
- 00:00 — Introduction
- 02:30 — [Topic 1]
- 15:45 — [Topic 2]
...
### Links Mentioned
- [Resource 1](url)
- [Resource 2](url)
### Key Quotes
> "Quote from the video" — [Speaker] at [timestamp]
Batch Processing
import json
video_ids = ["id1", "id2", "id3"]
for vid in video_ids:
try:
transcript = YouTubeTranscriptApi.get_transcript(vid)
text = " ".join([e["text"] for e in transcript])
with open(f"{vid}_transcript.txt", "w") as f:
f.write(text)
print(f"Saved transcript for {vid}")
except Exception as e:
print(f"Failed for {vid}: {e}")
Video Metadata Extraction
yt-dlp --dump-json --skip-download "VIDEO_URL" | python3 -c "
import sys, json
data = json.load(sys.stdin)
print(f'Title: {data[\"title\"]}')
print(f'Channel: {data[\"channel\"]}')
print(f'Duration: {data[\"duration\"]} seconds')
print(f'Views: {data.get(\"view_count\", \"N/A\")}')
print(f'Upload date: {data[\"upload_date\"]}')
print(f'Description: {data[\"description\"][:500]}')
"
Rules
- Prefer manual subtitles over auto-generated when available — they are more accurate.
- Preserve timestamps — always include timestamp references in transformed content.
- Attribute content — always credit the original video and channel.
- Handle missing transcripts — some videos have no subtitles; report this clearly rather than failing silently.
- Respect rate limits — add delays between batch requests to avoid API throttling.