| name | autocliper-ai |
| description | Turn any long YouTube interview, talk, or podcast into high-retention Chinese short clips for Shorts/Reels/Douyin/TikTok.
Triggers: YouTube URL + short clips / 切片 / 短视频 / 中文字幕 / hard subtitles.
Pipeline: source fetch -> subtitle parsing -> candidate analysis -> user review -> clip export -> subtitle burn.
|
AutoCliper.AI — Agent Skill
Convert one long YouTube video into 5–15 short clips with Chinese packaging and hard-burned subtitles.
This document is the single source of truth for any AI agent executing the AutoCliper pipeline. Follow every numbered step in order. Do not skip steps. Do not guess file names — inspect the filesystem after each write operation.
0. Pre-flight Checks
Before doing anything else, verify the environment. Report the exact blocker and stop if any check fails.
| Check | Command | Pass Condition |
|---|
| yt-dlp installed | which yt-dlp | returns a path |
| ffmpeg installed | which ffmpeg | returns a path; if missing, try python3 -c "import imageio_ffmpeg; print(imageio_ffmpeg.get_ffmpeg_exe())" |
| Python 3.9+ | python3 --version | >= 3.9 |
| Skill repo available | check that corekit/ directory exists relative to the skill root | corekit/__init__.py is present |
Set the environment for all subsequent commands:
export PYTHONPATH="<path-to-AutoCliper.AI-repo>"
All python3 -m corekit.* commands below assume this PYTHONPATH is set.
1. Create Workspace
Create the output directory structure for this video. Derive <slug> from the video title or ID — lowercase, hyphens, no spaces, no special characters.
mkdir -p "studio/<slug>/intake"
mkdir -p "studio/<slug>/intel"
mkdir -p "studio/<slug>/exports"
Expected result:
studio/<slug>/
├── intake/ ← raw video + subtitles land here
├── intel/ ← analysis artifacts land here
└── exports/ ← per-clip folders land here
2. Fetch Source Assets
Download the video and subtitles from YouTube:
python3 -m corekit.fetch_source "<YouTube_URL>" "studio/<slug>/intake"
What happens inside: the downloader tries English subtitles first (en, en-US, en-orig), falls back to Chinese (zh-Hans, zh-CN, zh). It uses --cookies-from-browser chrome for authenticated access.
After the command finishes, list the intake folder and identify:
- The
.mp4 video file (may contain the video title and ID in the filename)
- The
.srt subtitle file (may have a language tag like .en.srt or .zh-Hans.srt)
- Any sidecar files (
.ytdl, .jpg, etc.) — note them but they are not needed
If the download fails:
- Cookie error → tell the user to refresh their Chrome YouTube login, then retry
- No subtitles found → report "no subtitles available for this video" and stop
- Video download error → retry once with a different format; if still failing, report the error
Record the exact filenames for the next steps. Do not guess or hardcode names.
3. Parse Subtitles into JSON
Convert the SRT subtitle file into structured JSON for easier analysis:
python3 -m corekit.subtitle_to_json \
"studio/<slug>/intake/<exact-srt-filename>" \
"studio/<slug>/intel/transcript.json"
Verify the output file exists and contains an array of cue objects. Each cue has:
{
"index": 1,
"start": "00:00:01,234",
"end": "00:00:03,456",
"start_seconds": 1.234,
"end_seconds": 3.456,
"text": "the spoken content"
}
If the transcript is empty or has fewer than 10 cues, report "transcript too short or corrupted" and stop.
4. Analyze Transcript and Propose Candidates
This is the most important intellectual step. Read the following files before starting analysis:
- playbooks/clip-contract.md — defines the exact JSON schema for
selected_clips.json
- playbooks/content-analysis-playbook.md — defines the multi-pass analysis method, scoring rubric, and review formatting
4a. Multi-Pass Analysis
Do NOT try to pick clips in a single pass. Follow this sequence:
Pass 1 — Skim and flag: Read through transcript.json looking for stretches that contain memorable, information-dense, opinionated, counterintuitive, or emotionally sharp content. Flag generously — it is better to flag too many than too few.
Pass 2 — Boundary refinement: For each flagged stretch, re-read the surrounding cues (5–10 cues before and after). Choose a clean start and end:
- The start must land on a line that hooks the viewer within the first 3 seconds of the clip
- The end must land after the speaker has finished the thought — never mid-sentence
- If a promising moment needs a few seconds of context, extend the start earlier
- If a thought trails off, extend the end until it resolves
Pass 3 — Completeness check: For each candidate, verify:
Pass 4 — Score and rank: Score each candidate on four dimensions (1–5):
hook — how compelling is the opening in the first 3 seconds?
clarity — is it one clean idea with low ambiguity?
standalone — can it stand alone without prior context?
payoff — is the ending useful, memorable, or share-worthy?
Rank by weighted total: hook × 0.35 + clarity × 0.25 + standalone × 0.2 + payoff × 0.2
4b. Candidate Count Guidance
| Source Duration | Target Candidates |
|---|
| < 20 min | 5 – 8 |
| 20 – 60 min | 8 – 12 |
| > 60 min | 10 – 15 |
The default failure mode should be "too few candidates," not "too many." When in doubt, include more. The user will prune.
4c. Handling Auto-Generated (Noisy) Subtitles
YouTube auto-captions often contain:
- Repeated fragments (same phrase appearing in consecutive cues)
- Missing punctuation and capitalization
- Misheard words (especially names, technical terms)
When the source subtitles are auto-generated:
- Infer the intended meaning conservatively — do not invent claims
- Ignore duplicate fragments when judging clip boundaries
- Use surrounding context to determine where sentences actually end
- Note in the candidate summary if the source quality is low
4d. Write Outputs
Write two files:
-
studio/<slug>/intel/selected_clips.json — the machine-readable clip decisions. Must follow the schema in playbooks/clip-contract.md exactly.
-
studio/<slug>/intel/candidate-board.md — the human-readable review table.
5. Present Candidates for User Review
Show the candidate list in a review-friendly format. For each candidate, display:
| Field | Format | Example |
|---|
| ID | clip-XX | clip-01 |
| Time range | HH:MM:SS → HH:MM:SS | 00:12:03 → 00:13:22 |
| Duration | human-readable | 1m 19s |
| Title | one provocative Chinese title, ≤12 characters | AI终将取代一切? |
| Summary | exactly two sentences in Chinese | 句1: 在讲什么。句2: 为什么值得看。 |
Present as a numbered table so the user can reply with IDs.
If the user says "proceed" or "auto-pick": select the top-ranked candidates yourself. For a 1-hour video, export at least 8.
If the user picks specific IDs: export only those.
6. Export Each Chosen Clip
For each clip the user chose (or you auto-picked), execute steps 6a through 6f in order. Complete all sub-steps for one clip before moving to the next.
6a. Cut the Video Segment
python3 -m corekit.cut_video \
"studio/<slug>/intake/<exact-mp4-filename>" \
<start_seconds> <end_seconds> \
"studio/<slug>/exports/<clip-folder>/clip.mp4"
The <clip-folder> naming convention is XX-<title-slug>, e.g., 01-ai-will-replace.
Verify the output file exists and has a non-zero size.
6b. Window the Source Subtitle
python3 -m corekit.window_subtitles \
"studio/<slug>/intake/<exact-srt-filename>" \
<start_seconds> <end_seconds> \
"studio/<slug>/exports/<clip-folder>/clip.src.srt"
This extracts only the cues that overlap the clip window and shifts all timestamps so the clip starts at 00:00:00,000.
Verify the output SRT has at least one cue.
6c. Translate to Chinese Subtitle
Read clip.src.srt and translate it into simplified Chinese. Write the result to clip.zh.srt in the same folder.
Translation rules (follow strictly):
- Natural spoken Chinese — not written/formal Chinese. The viewer is watching a short video, not reading a document.
- Preserve timestamps exactly — unless you need to merge two cues for readability. Never arbitrarily shift timing.
- Keep named entities accurate — names of people, companies, products, numbers, and quoted phrases must be faithfully retained.
- Concise lines — prefer short Chinese phrases that fit short-form video pacing. Long scrolling subtitles kill retention.
- Rebalance across cues when needed — if an English sentence is split across 3 cues, you may redistribute the Chinese text across those same 3 cues. Preserve cue order.
- Do NOT collapse cues — a cue-dense English SRT with 40 cues should produce roughly 35–45 Chinese cues, not 10. The default failure mode is slightly too many cues, not far too few.
- Drop only true duplicates — auto-captions often repeat the same fragment in consecutive cues. You may drop the duplicate, but keep the first occurrence.
- If source is already Chinese — do not translate. Only clean obvious auto-caption noise (repeated fragments, broken characters).
6d. Generate Title and Description
For each clip, create:
Title (for the first-second overlay):
- Sharp, clickable, opinionated
- Target ≤ 12 Chinese characters so it fits cleanly on screen at font size 48
- May be provocative or contrarian but must be faithful to the speaker's meaning
- Should create curiosity, tension, or an urge to keep watching
Description (for platform distribution):
- ≤ 140 Chinese characters
- Must mention: who is speaking, what show/interview this is from, what topic is discussed, what the key claim or takeaway is
Write both to studio/<slug>/exports/<clip-folder>/metadata.txt:
标题:AI终将取代一切?
描述:Sam Altman 在 Lex Fridman 播客中谈到 AGI 的时间表——他认为大多数人严重低估了 AI 的发展速度,未来三年将改变一切。
6e. Burn Chinese Subtitles + Title into Video
python3 -m corekit.render_hardsubs \
"studio/<slug>/exports/<clip-folder>/clip.mp4" \
"studio/<slug>/exports/<clip-folder>/clip.zh.srt" \
"studio/<slug>/exports/<clip-folder>/clip.hardsub.mp4" \
--title "<the-title-from-6d>"
CRITICAL: Pass the Chinese clip.zh.srt file, NOT clip.src.srt. This is the most common agent mistake — passing the source-language subtitle instead of the translation.
The burn step will:
- Auto-detect libass; if unavailable, fall back to drawtext rendering (no action needed from you)
- Auto-select the best H.264 encoder for the current machine
- Render the title centered at font size 48 with 3px black outline during the first second
- Render subtitle cues at the bottom of the frame for the rest of the clip
Verify the output clip.hardsub.mp4 exists and has a larger file size than clip.mp4 (the filter chain adds visual data).
6f. Append to Packaging Copy
After each clip is exported, append its title and description to the combined packaging file:
studio/<slug>/intel/packaging-copy.md
Format:
## clip-01: AI终将取代一切?
**标题**: AI终将取代一切?
**描述**: Sam Altman 在 Lex Fridman 播客中谈到 AGI 的时间表...
**文件**: studio/<slug>/exports/01-ai-will-replace/clip.hardsub.mp4
**时长**: 1m 19s
---
7. Final Output
After all clips are exported, return to the user:
- Source asset folder:
studio/<slug>/intake/
- Candidate list: the full table from step 5 (timestamps, duration, title, summary)
- Packaging file:
studio/<slug>/intel/packaging-copy.md
- Exported clips: list each
clip.hardsub.mp4 path
Workspace Layout Reference
studio/<video-slug>/
├── intake/ # raw assets from YouTube
│ ├── <title> [<id>].mp4 # source video (filename from yt-dlp)
│ └── <title> [<id>].<lang>.srt # source subtitle
├── intel/ # analysis artifacts
│ ├── transcript.json # structured subtitle cues (from step 3)
│ ├── selected_clips.json # clip decisions (from step 4)
│ ├── candidate-board.md # review table (from step 4)
│ └── packaging-copy.md # all titles + descriptions (from step 6f)
└── exports/ # one folder per exported clip
└── 01-<slug>/
├── clip.mp4 # raw cut (no subtitles)
├── clip.src.srt # windowed source-language subtitle
├── clip.zh.srt # translated Chinese subtitle
├── clip.hardsub.mp4 # final deliverable (burned subtitles + title)
└── metadata.txt # title + description for this clip
Module Reference
| Module | Invocation | Arguments |
|---|
corekit.fetch_source | python3 -m corekit.fetch_source <url> <output_dir> | YouTube URL, output directory |
corekit.subtitle_to_json | python3 -m corekit.subtitle_to_json <input.srt> <output.json> | SRT file path, JSON output path |
corekit.cut_video | python3 -m corekit.cut_video <input.mp4> <start_sec> <end_sec> <output.mp4> | source video, start seconds (float), end seconds (float), output path |
corekit.window_subtitles | python3 -m corekit.window_subtitles <input.srt> <start_sec> <end_sec> <output.srt> | source SRT, start seconds (float), end seconds (float), output path |
corekit.render_hardsubs | python3 -m corekit.render_hardsubs <input.mp4> <input.srt> <output.mp4> --title "..." | clip video, Chinese SRT, output path, optional title text |
Optional flags for render_hardsubs:
--fontfile <path> — override the auto-detected Chinese font
--subtitle-fontsize <int> — subtitle font size for drawtext fallback (default 28)
Execution Safety Notes
-
Paths with spaces or non-ASCII: Always pass file paths as separate arguments to shell commands. Never build a command string with f"ffmpeg -i {path}" — the downloader produces filenames like My Interview [abc123].mp4. Use list-based subprocess.run(cmd).
-
Cookie failures: The downloader uses --cookies-from-browser chrome. If it fails with a 403 or login-required error, tell the user to open YouTube in Chrome and verify they are logged in, then retry. Do not modify the download command.
-
Verify after every write: After every command that produces a file, check that the file exists and has non-zero size before proceeding. If a step produces no output, report it immediately.
-
Vertical shorts: If the user asks for vertical (9:16) shorts, complete the entire horizontal pipeline first, then crop/reframe as a post-processing step. Do not attempt vertical crop during the main pipeline.
-
Encoder detection is automatic: Do not hardcode libx264 or aac in any manual ffmpeg commands. The corekit modules handle encoder selection. If you need to run ffmpeg directly for any reason, use corekit.ffmpeg_locator.h264_encoder() and corekit.ffmpeg_locator.aac_encoder() to get the right encoder names.
Error Reporting
If the pipeline cannot complete, report the exact blocker using one of these categories:
| Blocker | What to tell the user |
|---|
COOKIES_EXPIRED | "YouTube download failed due to expired cookies. Please log in to YouTube in Chrome and retry." |
NO_SUBTITLES | "No English or Chinese subtitles found for this video. The pipeline requires subtitles." |
FFMPEG_MISSING | "ffmpeg is not installed. Install via brew install ffmpeg or pip install imageio-ffmpeg." |
DOWNLOAD_FAILED | "Video download failed. [include the error message from yt-dlp]" |
TRANSCRIPT_EMPTY | "The subtitle file is empty or contains too few cues to analyze." |
LOW_QUALITY_TRANSCRIPT | "The auto-generated subtitles are too noisy to produce reliable clips. Consider finding a manually transcribed version." |
Do not attempt workarounds for blockers. Report and stop.