com um clique
yt2bb
// Use when the user wants to repurpose a YouTube video for Bilibili, add bilingual (English-Chinese) subtitles to a video, or create hardcoded subtitle versions for Chinese platforms.
// Use when the user wants to repurpose a YouTube video for Bilibili, add bilingual (English-Chinese) subtitles to a video, or create hardcoded subtitle versions for Chinese platforms.
| name | yt2bb |
| description | Use when the user wants to repurpose a YouTube video for Bilibili, add bilingual (English-Chinese) subtitles to a video, or create hardcoded subtitle versions for Chinese platforms. |
| license | MIT |
| homepage | https://github.com/Agents365-ai/yt2bb |
| compatibility | Requires Python 3, ffmpeg, yt-dlp, whisper (openai-whisper) on PATH. Self-check steps that need vision are gracefully skipped if unavailable. |
| platforms | ["macos","linux","windows"] |
| allowed-tools | Bash(python3:*) Bash(ffmpeg:*) Bash(whisper:*) Bash(yt-dlp:*) Bash(git:*) Read Write Edit |
| metadata | {"openclaw":{"requires":{"bins":["python3","ffmpeg","yt-dlp","whisper"]},"emoji":"🎬","os":["darwin","linux","win32"],"install":[{"id":"brew-ffmpeg","kind":"brew","formula":"ffmpeg","bins":["ffmpeg"],"label":"Install ffmpeg via Homebrew","os":["darwin"]},{"id":"apt-ffmpeg","kind":"apt","package":"ffmpeg","bins":["ffmpeg"],"label":"Install ffmpeg via apt","os":["linux"]},{"id":"brew-ytdlp","kind":"brew","formula":"yt-dlp","bins":["yt-dlp"],"label":"Install yt-dlp via Homebrew","os":["darwin"]},{"id":"pip-ytdlp","kind":"pip","package":"yt-dlp","bins":["yt-dlp"],"label":"Install yt-dlp via pip","os":["linux","win32"]},{"id":"pip-whisper","kind":"pip","package":"openai-whisper","bins":["whisper"],"label":"Install openai-whisper via pip"}]},"clawhub":{"requires":{"bins":["python3","ffmpeg","yt-dlp","whisper"]},"category":"media","install":[{"id":"brew-ffmpeg","kind":"brew","formula":"ffmpeg","bins":["ffmpeg"],"label":"Install ffmpeg via Homebrew","os":["darwin"]},{"id":"apt-ffmpeg","kind":"apt","package":"ffmpeg","bins":["ffmpeg"],"label":"Install ffmpeg via apt","os":["linux"]},{"id":"brew-ytdlp","kind":"brew","formula":"yt-dlp","bins":["yt-dlp"],"label":"Install yt-dlp via Homebrew","os":["darwin"]},{"id":"pip-ytdlp","kind":"pip","package":"yt-dlp","bins":["yt-dlp"],"label":"Install yt-dlp via pip","os":["linux","win32"]},{"id":"pip-whisper","kind":"pip","package":"openai-whisper","bins":["whisper"],"label":"Install openai-whisper via pip"}]},"hermes":{"tags":["youtube","bilibili","subtitles","bilingual","video","localization","whisper","yt-dlp"],"category":"media","requires_tools":["python3","ffmpeg","yt-dlp","whisper"],"related_skills":["ffmpeg","video-podcast-maker"]},"codex":{"requires":{"bins":["python3","ffmpeg","yt-dlp","whisper"]},"allowed-tools":["bash","read","write","edit"]},"claude-code":{"allowed-tools":"Bash(python3:*) Bash(ffmpeg:*) Bash(whisper:*) Bash(yt-dlp:*) Bash(git:*) Read Write Edit"},"pi":{"requires":{"bins":["python3","ffmpeg","yt-dlp","whisper"]},"allowed-tools":["bash","read","write","edit"]},"skillsmp":{"topics":["claude-code","claude-code-skill","claude-skills","agent-skills","skillsmp","openclaw","openclaw-skills","skill-md","pi-coding-agent","youtube","bilibili","subtitles","video"]},"author":"Agents365-ai","version":"2.5.0"} |
Six-step pipeline: download → transcribe → translate → merge → burn subtitles → generate publish info. Produces a video with hardcoded bilingual (EN/ZH) subtitles and a publish_info.md with Bilibili upload metadata.
| Step | Tool | Command | Output |
|---|---|---|---|
| 0. Update | git | Auto-check for skill updates | — |
| 1. Download | yt-dlp | yt-dlp --cookies-from-browser chrome -f ... -o ... | {slug}.mp4 |
| 2. Transcribe | whisper* | srt_utils.py check-whisper then transcribe | {slug}_{lang}.srt |
| 2.5 Validate | srt_utils.py | srt_utils.py validate / fix | {slug}_{lang}.srt (fixed) |
| 3. Translate | AI | SRT-aware batch translation | {slug}_zh.srt |
| 4. Merge | srt_utils.py | srt_utils.py merge ... | {slug}_bilingual.srt |
| 4.5 Style | srt_utils.py | srt_utils.py to_ass --preset netflix|clean|glow | {slug}_bilingual.ass |
| 5. Burn | ffmpeg | ffmpeg -c:v libx264 -vf ass=... | {slug}_bilingual.mp4 |
| 6. Publish | AI | Analyze content, generate metadata | publish_info.md |
Throttle to one check per 24 hours per installation; never mutate the skill directory without explicit user consent. SKILL_DIR resolved here is reused by later pipeline steps for script paths.
If <this-skill-dir>/.last_update exists and is less than 24 hours old, skip this step entirely.
Otherwise, fetch the latest tag from upstream:
git -C <this-skill-dir> ls-remote --tags origin 'v*' 2>/dev/null \
| awk '{print $2}' | sed 's|refs/tags/||' \
| sort -V | tail -1
Compare with this skill's metadata.version from the frontmatter. If the upstream tag is strictly newer (semver), tell the user one line and ask:
"A newer version of this skill is available: vX.Y.Z → vA.B.C. Want me to
git pull?"
If they say yes, run git -C <this-skill-dir> pull --ff-only. Refresh .last_update either way so the prompt doesn't repeat for 24 hours.
If upstream is the same or older, refresh .last_update silently and continue.
On any failure (offline, not a git checkout — e.g. ClawHub-installed copy, read-only path, no permission), swallow the error silently and continue with the user's task. Do not mention the failure.
Resolve SKILL_DIR for use by later pipeline steps:
# Find skill directory (works across Claude Code, OpenClaw, Hermes, Pi)
SKILL_DIR="$(find ~/.claude/skills ~/.openclaw/skills ~/.hermes/skills ~/.pi/agent/skills ~/.agents/skills ~/myagents/myskills -maxdepth 2 -name 'yt2bb' -type d 2>/dev/null | head -1)"
Single video:
slug="video-name" # or: slug=$(python3 "$SKILL_DIR/srt_utils.py" slugify "Video Title")
mkdir -p "${slug}"
yt-dlp --cookies-from-browser chrome \
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \
-o "${slug}/${slug}.mp4" "https://www.youtube.com/watch?v=VIDEO_ID"
Playlist / series:
yt-dlp --cookies-from-browser chrome \
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \
-o "%(playlist_index)03d-%(title)s/%(playlist_index)03d-%(title)s.mp4" \
"https://www.youtube.com/playlist?list=PLAYLIST_ID"
After downloading, rename each folder to a clean slug and run Steps 2–6 for each video sequentially.
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]": ensure mp4 output, avoid webm%(playlist_index)03d: zero-padded index to preserve playlist order--cookies-from-browser fails, export cookies first — see TroubleshootingFirst run the environment check to detect your platform and get a tailored whisper command:
python3 "$SKILL_DIR/srt_utils.py" check-whisper
This auto-detects OS, GPU (CUDA/Metal/CPU), memory, and installed backends, then recommends the best backend + model for your hardware. If memory detection is unavailable, it falls back conservatively instead of assuming a low-memory machine. Use the command it prints.
Manual fallback (openai-whisper, works everywhere):
src_lang="en" # Change to ja/ko/es/etc. based on source video
whisper_model="medium" # check-whisper recommends the best model for your hardware
whisper "${slug}/${slug}.mp4" \
--model "$whisper_model" \
--language "$src_lang" \
--word_timestamps True \
--condition_on_previous_text False \
--output_format srt \
--max_line_width 40 --max_line_count 1 \
--output_dir "${slug}"
mv "${slug}/${slug}.srt" "${slug}/${slug}_${src_lang}.srt"
Supported backends:
| Backend | Best for | Install |
|---|---|---|
mlx-whisper | macOS Apple Silicon (fastest) | pip install mlx-whisper |
whisper-ctranslate2 | Windows/Linux CUDA, or CPU (~4x faster) | pip install whisper-ctranslate2 |
openai-whisper | Universal fallback | pip install openai-whisper |
Model selection (auto-recommended by check-whisper):
tiny — fast draft, low accuracy, CPU-friendly (~1 GB)medium — default, good balance (~5 GB)large-v3 — best accuracy, recommended for JA/KO/ZH source (~10 GB)Notes:
--language: explicitly set to avoid misdetection; supports en, ja, ko, es, etc.--word_timestamps True: more precise subtitle timing--condition_on_previous_text False: prevent hallucination loopspython3 "$SKILL_DIR/srt_utils.py" validate "${slug}/${slug}_${src_lang}.srt"
# If issues found:
python3 "$SKILL_DIR/srt_utils.py" fix "${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_${src_lang}.srt"
Read {slug}_{src_lang}.srt and translate to Chinese. Critical rules:
These rules are modeled on the Netflix Simplified Chinese Timed Text Style Guide; follow them to produce broadcast-grade subtitles.
--> lines) exactly as-is。, !, ?; keep mid-sentence ,, 、, ; only when they add clarity,。!?、;:); use 「」 for inner quotes, not "" or ''GPT-4, 30fps, 2026); only punctuation is full-width的, 了, 吗, 呢, 吧, 啊); never split an English phrasal unit across a line break; keep modifiers with their heads{slug}/{slug}_zh.srtpython3 "$SKILL_DIR/srt_utils.py" merge \
"${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_zh.srt" "${slug}/${slug}_bilingual.srt"
Run lint on the merged bilingual SRT to catch Netflix Timed Text Style Guide violations that validate doesn't cover — reading speed (CPS), per-line length, inter-cue gaps, and line count.
python3 "$SKILL_DIR/srt_utils.py" lint "${slug}/${slug}_bilingual.srt"
Defaults (all overridable via flags):
| Rule | Threshold | Flag |
|---|---|---|
| Reading speed (English) | ≤ 17 CPS | --max-cps-en |
| Reading speed (Simplified Chinese) | ≤ 9 CPS | --max-cps-zh |
| Min cue duration | 833 ms (5/6 s) | --min-duration-ms |
| Max cue duration | 7000 ms | --max-duration-ms |
| Min inter-cue gap | 83 ms (2 frames @ 24 fps) | --min-gap-ms |
| Max chars/line (English) | 42 | --max-chars-en |
| Max chars/line (Chinese, full-width) | 16 | --max-chars-zh |
| Max lines per cue | 2 | — |
Severity model:
When CPS errors fire, the fix is almost always upstream — go back to Step 3 and rewrite the offending Chinese entry to fit the time window. Do not solve CPS by extending the cue past the source's spoken duration.
Agent-friendly output:
python3 "$SKILL_DIR/srt_utils.py" lint "${slug}/${slug}_bilingual.srt" --format json
Returns {ok, error_count, warning_count, issues: [{index, code, severity, message}, ...]} for programmatic filtering.
Convert the bilingual SRT to an ASS file. ASS enables per-line color, font size, and glow effects that are impossible with SRT force_style. Layout rule: subtitles always stay at the bottom. Default stack: ZH on the upper line of the bottom stack, EN on the lower line. The presets are tuned to keep the block readable while reducing overlap risk with lower-screen content.
IMPORTANT — Ask before proceeding. Present the preset table below to the user and ask which style they prefer. Do NOT silently pick a default. If the user has no preference, use
clean.
Available presets:
| Preset | Look | Best for |
|---|---|---|
netflix | Pure white text, thin black outline, soft drop shadow, no box — modeled on the Netflix Timed Text Style Guide | Professional, broadcast-grade look. Best default for documentaries, interviews, long-form content, and anything that should feel "streaming-platform native". Use with --font "Source Han Sans SC" on Linux / "PingFang SC" on macOS for closest Netflix Sans feel |
clean | Yellow text on gray box — golden ZH + light yellow EN, semi-transparent light gray background | Readability safety net for busy or mixed-brightness footage where netflix's outline-only text could get visually lost. The gray box guarantees a readable contrast pad |
glow | Yellow ZH + white EN with colored glow — bright yellow ZH + white EN, blurred outer glow, no background box | Entertainment, vlogs, energetic edits. Most eye-catching, but weakest on bright or busy backgrounds |
Example prompt to user:
字幕有三套样式可选:
netflix— 纯白字体 + 细黑描边 + 柔和阴影(默认推荐,Netflix 专业观感,适合纪录片/访谈/长内容)clean— 黄色字体 + 灰色半透明底框(亮背景或花背景的兜底选项,底框保证对比度)glow— 黄色/白色字体 + 彩色外发光(更抢眼,适合娱乐/Vlog)- 自定义 — 提供
.ass样式文件,完全控制字体、颜色、大小(可用 Aegisub 可视化编辑)选哪个?默认推荐
netflix;如果画面特别花哨或底部信息多,可改用clean。
# Netflix-grade default (white + outline + soft shadow), ZH on top
python3 "$SKILL_DIR/srt_utils.py" to_ass \
"${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
--preset netflix
# Gray-box fallback for busy backgrounds, EN on top
python3 "$SKILL_DIR/srt_utils.py" to_ass \
"${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
--preset clean --top en
# Vibrant glow (B站 entertainment style)
python3 "$SKILL_DIR/srt_utils.py" to_ass \
"${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
--preset glow
Custom style file — for full control, provide an external .ass file with your own [V4+ Styles] section. It must contain styles named EN and ZH, or to_ass will fail early with a validation error. You can design styles visually with Aegisub and export.
python3 "$SKILL_DIR/srt_utils.py" to_ass \
"${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
--style-file my_styles.ass
Optionally add ; en_tag= and ; zh_tag={\blur5} comment lines in the .ass file to inject ASS override tags per language.
Font by platform (pass with --font, ignored when using --style-file):
| Platform | Flag |
|---|---|
| macOS | --font "PingFang SC" (default) |
| Linux | --font "Noto Sans CJK SC" |
| Windows | --font "Microsoft YaHei" |
Other options:
--top zh|en — which language on the upper line of the bottom stack (default: zh)--res WxH — video resolution (default: 1920x1080)Readability notes for all presets:
--res so 720p and 1080p keep similar visual balanceclean is the safest choice when you must keep subtitles at the bottom in every shotUse the ass= filter (not subtitles=) — all styling comes from the ASS file.
ffmpeg -i "${slug}/${slug}.mp4" \
-vf "ass='${slug}/${slug}_bilingual.ass'" \
-c:v libx264 -crf 23 -preset medium \
-c:a copy "${slug}/${slug}_bilingual.mp4"
-c:v libx264 -crf 23: good quality with reasonable file size-preset medium: balance between speed and compression (use fast for quicker encode)force_style needed — styles are embedded in the ASS fileBased on the video content (from {slug}_{src_lang}.srt and {slug}_zh.srt), generate {slug}/publish_info.md.
All output in this file must be in Chinese (targeting Bilibili audience).
# Publish Info
## Source
{YouTube URL}
## Titles (5 variants)
1. {Suspense/question style — spark curiosity}
2. {Data/achievement driven — emphasize results}
3. {Controversial/opinion style — spark discussion}
4. {Tutorial/practical style — emphasize utility}
5. {Emotional/relatable style — connect with audience}
## Tags
{~10 comma-separated keywords covering topic, technology, domain}
## Description
{3-5 sentences summarizing core content and highlights}
## Chapter Timestamps
00:00 {chapter name}
...
Generation rules:
{slug}_bilingual.srt at topic transition points{slug}/
├── {slug}.mp4 # Source video
├── {slug}_{src_lang}.srt # Source language subtitles
├── {slug}_zh.srt # Chinese subtitles
├── {slug}_bilingual.srt # Merged bilingual
├── {slug}_bilingual.mp4 # Final output
└── publish_info.md # Bilibili upload metadata
python3 "$SKILL_DIR/srt_utils.py" merge en.srt zh.srt output.srt # Merge bilingual
python3 "$SKILL_DIR/srt_utils.py" merge --dry-run en.srt zh.srt output.srt # Pre-check without writing
python3 "$SKILL_DIR/srt_utils.py" validate input.srt # Check timing issues
python3 "$SKILL_DIR/srt_utils.py" fix input.srt output.srt # Fix timing/overlaps (multi-pass)
python3 "$SKILL_DIR/srt_utils.py" slugify "Video Title" # Generate slug
python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass # Convert to styled ASS (default: clean, ZH on top)
python3 "$SKILL_DIR/srt_utils.py" to_ass --dry-run input.srt output.ass # Pre-check without writing
python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass --preset glow --top en
python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass --style-file custom.ass # User-defined styles
python3 "$SKILL_DIR/srt_utils.py" check-whisper # Detect platform, recommend whisper backend + model
--pad-missing to pad--cookies-from-browser chrome requires Chrome to be closed (or uses a snapshot of the profile). If it fails:
# Export cookies once, then reuse the file
yt-dlp --cookies-from-browser chrome --cookies cookies.txt --skip-download "URL"
yt-dlp --cookies cookies.txt -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" -o "${slug}/${slug}.mp4" "URL"
For 429 / rate-limit errors, add --sleep-interval 3 --max-sleep-interval 8.
Symptoms: repeated phrases, garbled characters, or near-empty SRT despite clear audio.
whisper "${slug}/${slug}.mp4" \
--model medium \
--language "$src_lang" \
--condition_on_previous_text False \
--no_speech_threshold 0.6 \
--logprob_threshold -1.0 \
--compression_ratio_threshold 2.0 \
--output_format srt \
--output_dir "${slug}"
If language is still misdetected, the audio likely has long silence or non-speech segments — add --vad_filter True to suppress them.
Pass the correct font via --font in the to_ass step (Step 4.5). The ASS file embeds the font name, so ffmpeg needs it installed at burn time.
| Platform | Font | Install |
|---|---|---|
| macOS | PingFang SC | pre-installed |
| Linux | Noto Sans CJK SC | sudo apt install fonts-noto-cjk |
| Linux (alt) | WenQuanYi Micro Hei | sudo apt install fonts-wqy-microhei |
| Windows | Microsoft YaHei | pre-installed |
Regenerate the ASS file with the correct --font flag, then re-run the burn step.
yt-dlp --cookies-from-browser chrome to access age-gated or private videos. This reads Chrome cookies locally — no cookies are transmitted beyond YouTube's own servers. To avoid this, export cookies to a file first (see Troubleshooting above).git fetch to check for skill updates. It does not auto-pull or execute remote code.srt_utils.py makes no network requests. All processing (SRT parsing, merging, ASS generation, hardware detection) is fully local.