원클릭으로
baoyu-youtube-transcript
Download YouTube transcripts, subtitles, and cover images by URL or video ID.
메뉴
Download YouTube transcripts, subtitles, and cover images by URL or video ID.
| name | baoyu-youtube-transcript |
| description | Download YouTube transcripts, subtitles, and cover images by URL or video ID. |
| version | 1.1.0 |
| created | 2026-03-22 |
| source | https://github.com/JimLiu/baoyu-skills#baoyu-youtube-transcript |
| author | JimLiu |
| modifications | 本 Skill 可能是在原作者 Skill 的基础上进行了修改。 |
| metadata | {"openclaw":{"homepage":"https://github.com/JimLiu/baoyu-skills#baoyu-youtube-transcript","requires":{"anyBins":["bun","npx"]}}} |
单一来源:本 skill 的唯一实体在
henri_skills仓库中,~/.claude/skills/等均为软链接。编辑时请直接修改henri_skills中的文件。
Downloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly.
Fetches video metadata and cover image on first run, caches raw data for fast re-formatting.
Scripts in scripts/ subdirectory. {baseDir} = this SKILL.md's directory path. Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun. Replace {baseDir} and ${BUN_X} with actual values.
| Script | Purpose |
|---|---|
scripts/main.ts | Transcript download CLI |
# Default: markdown with timestamps (English)
${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>
# Specify languages (priority order)
${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja
# Without timestamps
${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps
# With chapter segmentation
${BUN_X} {baseDir}/scripts/main.ts <url> --chapters
# With speaker identification (requires AI post-processing)
${BUN_X} {baseDir}/scripts/main.ts <url> --speakers
# SRT subtitle file
${BUN_X} {baseDir}/scripts/main.ts <url> --format srt
# Translate transcript
${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans
# List available transcripts
${BUN_X} {baseDir}/scripts/main.ts <url> --list
# Force re-fetch (ignore cache)
${BUN_X} {baseDir}/scripts/main.ts <url> --refresh
| Option | Description | Default |
|---|---|---|
<url-or-id> | YouTube URL or video ID (multiple allowed) | Required |
--languages <codes> | Language codes, comma-separated, in priority order | en |
--format <fmt> | Output format: text, srt | text |
--translate <code> | Translate to specified language code | |
--list | List available transcripts instead of fetching | |
--timestamps | Include [HH:MM:SS → HH:MM:SS] timestamps per paragraph | on |
--no-timestamps | Disable timestamps | |
--chapters | Chapter segmentation from video description | |
--speakers | Raw transcript with metadata for speaker identification | |
--exclude-generated | Skip auto-generated transcripts | |
--exclude-manually-created | Skip manually created transcripts | |
--refresh | Force re-fetch, ignore cached data | |
-o, --output <path> | Save to specific file path | auto-generated |
--output-dir <dir> | Base output directory | youtube-transcript |
Accepts any of these as video input:
https://www.youtube.com/watch?v=dQw4w9WgXcQhttps://youtu.be/dQw4w9WgXcQhttps://www.youtube.com/embed/dQw4w9WgXcQhttps://www.youtube.com/shorts/dQw4w9WgXcQdQw4w9WgXcQ| Format | Extension | Description |
|---|---|---|
text | .md | Markdown with frontmatter (incl. description), title heading, summary, optional TOC/cover/timestamps/chapters/speakers |
srt | .srt | SubRip subtitle format for video players |
youtube-transcript/
├── .index.json # Video ID → directory path mapping (for cache lookup)
└── {channel-slug}/{title-full-slug}/
├── meta.json # Video metadata (title, channel, description, duration, chapters, etc.)
├── transcript-raw.json # Raw transcript snippets from YouTube API (cached)
├── transcript-sentences.json # Sentence-segmented transcript (split by punctuation, merged across snippets)
├── imgs/
│ └── cover.jpg # Video thumbnail
├── transcript.md # Markdown transcript (generated from sentences)
└── transcript.srt # SRT subtitle (generated from raw snippets, if --format srt)
{channel-slug}: Channel name in kebab-case{title-full-slug}: Full video title in kebab-caseThe --list mode outputs to stdout only (no file saved).
On first fetch, the script saves:
meta.json — video metadata, chapters, cover image path, language infotranscript-raw.json — raw transcript snippets from YouTube API ({ text, start, duration }[])transcript-sentences.json — sentence-segmented transcript ({ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]), split by sentence-ending punctuation (.?!…。?! etc.), timestamps proportionally allocated by character length, CJK-aware text mergingimgs/cover.jpg — video thumbnailSubsequent runs for the same video use cached data (no network calls). Use --refresh to force re-fetch. If a different language is requested, the cache is automatically refreshed.
SRT output (--format srt) is generated from transcript-raw.json. Text/markdown output uses transcript-sentences.json for natural sentence boundaries.
When user provides a YouTube URL and wants the transcript:
--list first if the user hasn't specified a language, to show available options? as a glob wildcard, so an unquoted YouTube URL causes "no matches found": use 'https://www.youtube.com/watch?v=ID'--chapters --speakers for the richest output (chapters + speaker identification)--speakers mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labelsWhen user only wants a cover image or metadata, running the script with any option will also cache meta.json and imgs/cover.jpg.
When re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.
--chapters)The script parses chapter timestamps from the video description (e.g., 0:00 Introduction), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as .md with a Table of Contents. No further processing needed.
If no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.
--speakers)Speaker identification requires AI processing. The script outputs a raw .md file containing:
After the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:
.md file{baseDir}/prompts/speaker-transcript.md**Speaker Name:** labels, paragraph grouping (2-4 sentences), and [HH:MM:SS → HH:MM:SS] timestamps.md file with the processed transcript (keep the YAML frontmatter)When --speakers is used, --chapters is implied — the processed output always includes chapter segmentation.
| Error | Meaning |
|---|---|
| Transcripts disabled | Video has no captions at all |
| No transcript found | Requested language not available |
| Video unavailable | Video deleted, private, or region-locked |
| IP blocked | Too many requests, try again later |
| Age restricted | Video requires login for age verification |
整理 Downloads 根目录文件,按类型分类到对应文件夹。对书籍目录进一步按主题归类,并维护书籍清单.md。操作前必须获取用户授权,不直接删除文件。
Scan, clean, back up, and uninstall AI dev tools on macOS.
一键搭建项目文档系统骨架(核心三文档 + c.md + jobs/legacy 流转 + Agents.md 约定)。当用户要求初始化文档系统、搭建文档结构、或为新项目创建文档骨架时使用。
Audit agent config health across all layers. Run periodically or when collaboration feels off.
Test SDK connectivity with third-party model providers (GLM, MiniMax, etc.).
Write session summaries and rewrite work-notes into articles.