| name | video-edit |
| description | Edit any video into a captioned showcase — transcribe (any language, defaults to large-v3), present a transcript_review.txt for the user to fix mishears BEFORE rendering, then build a HyperFrames composition with liquid-glass caption pills, liquid blob background, liquid morph wipes, optional behind-subject text via background removal, and render the final video. Use whenever the user provides a video file and asks to edit it, caption it, add subtitles, fix existing captions, make a reel/promo/captioned tutorial, or "do the same" pattern as a prior captioned video. Supports English, Hebrew, and any Whisper-supported language. **Renders both 16:9 (YouTube / horizontal) and 9:16 (TikTok / Instagram Reels / YouTube Shorts) from the SAME 16:9 source** — vertical mode uses a centered footage strip with a blurred backdrop + liquid blobs and a vertical-tuned caption pill, no need to re-shoot. THE PIPELINE PAUSES FOR USER APPROVAL on the transcript before final render — this is the support mechanism for getting captions perfect (especially Hebrew). Pairs with hyperframes, hyperframes-cli, hyperframes-registry, and yuv-design-system skills. |
Video Edit — Captioned Showcase Pipeline
End-to-end captioned video editor on top of HyperFrames. The user gives you a video; you orchestrate transcribe → review → render and ALWAYS pause for transcript approval before the long render.
Where this skill sits in the YUV.AI pyramid
video-edit is in the middle tier of the YUV.AI skills pyramid alongside yuv-design-system, yuv-decks, yuv-viral-video, parallax-landing-page, and video-to-landing-page. The top-tier orchestrator yuv-pilot routes here whenever the user wants a captioned showcase, tutorial, or talking-head edit with subtitles.
This is the more general video sibling to yuv-viral-video. The split:
yuv-viral-video — opinionated YUV.AI viral-short pipeline (MrBeast pacing, signature editorial style)
video-edit — general captioned editor with transcript-review-before-render (Hebrew + English + any Whisper language)
For YUV.AI-branded captioned video, pair this skill with yuv-design-system (Neon mode for type/palette decisions). For generic / third-party captioned video, this skill works standalone.
When to invoke
- A path to a video file (mp4/mov/mkv) + a request to "edit", "caption", "add subtitles", "make a reel/promo", "do the same"
- "Fix the captions / Hebrew misspells" — re-enter at the review step on an existing project
- Any captioned tutorial / talking-head / promo build
Save location
Default: ~/Documents/yuv-projects/videos/<slug>/ — always save captioned video projects here so renders are findable. The <slug> is short, derived from the topic or source filename.
mkdir -p ~/Documents/yuv-projects/videos
cd ~/Documents/yuv-projects/videos
Final render lands at ~/Documents/yuv-projects/videos/<slug>/renders/<name>_FINAL.mp4. Tell the user where the video lives at the end of the render.
Workflow (12 steps)
-
Probe the source — ffprobe for dimensions, fps, duration, audio.
-
Scaffold — cd ~/Documents/yuv-projects/videos && npx hyperframes init <slug> --video <path> --non-interactive. Rename the copied video to source.mp4.
-
Extract audio — ffmpeg -i source.mp4 -vn -ac 1 -ar 16000 audio.wav.
-
Transcribe — copy references/transcribe.py into the project. Default model large-v3 (best Hebrew). CUDA usually fails on Windows (missing cuDNN); the script falls back to CPU int8. Force language="he" for Hebrew, language="en" for English; otherwise auto-detect.
-
Apply known corrections — copy references/corrections-hebrew.md content into a corrections.json at the project root (keys = wrong token, values = correct token).
-
🛑 STOP — start the review server and let the user approve in a webapp.
First apply known corrections: copy references/make_review.py into the project and run
python make_review.py. It applies corrections.json to transcript.json.
Then spawn the review server as a background task (it blocks until the user clicks
"Approve & Render" in the browser):
python "$HOME/.claude/skills/video-edit/references/serve_review.py" .
The server prints a line like REVIEW_URL=http://localhost:PORT/. Grab that URL from
the background-task output (or read stdout) and send the user:
👉 Review your transcript here: http://localhost:PORT/
When you click Approve & Render, I'll continue automatically.
The agent does not need a "continue" message — when the user clicks the button, the
server writes transcript_review.txt to the project dir AND exits with code 0. The
agent's background-task notification fires, and the pipeline resumes from step 8.
Fallback if no browser / no server: open the editor as a static file
(start "" "$HOME/.claude/skills/video-edit/transcript-editor/index.html"),
ask the user to pick the project folder, edit, save transcript_review.txt back into
the project, and reply "continue". The editor supports both modes.
-
(Optional) Background removal — see step 7 below; can run in parallel with the user's
review.
-
After approval, run python references/apply_review.py. It re-tokenises edited lines and
redistributes word timings back into transcript.json so caption sync still works.
-
(Optional) Background removal — if any talking-head segment needs behind-subject text, extract the segment as outro.mp4 (or intro.mp4) and run npx hyperframes remove-background <clip>.mp4 -o <name>_subject.webm --quality best. CPU only on most setups (~3–8 min for a ~15s 1440p clip).
-
Re-encode source with dense keyframes — multi-worker render seeks freeze on sparse keyframes. Always run:
ffmpeg -y -i source.mp4 -c:v libx264 -preset medium -crf 18 -r 30 -g 30 -keyint_min 30 -sc_threshold 0 -pix_fmt yuv420p -movflags +faststart -c:a copy footage.mp4
-
Re-load the (edited) transcript and generate the body sub-composition via references/gen_body.py. The generator emits the full compositions/components/caption-body.html with editorial + matrix alternating in liquid-glass pills, anchored lower-left-of-centre (clears bottom-right webcam PiPs).
-
Wire the host index.html from references/host-template.html. Layer order (z-index, NOT track-index):
- z0: footage
.cam-bg
- z1: liquid blob background (
compositions/liquid-blobs.html, mix-blend-mode: screen, full duration)
- z2: parallax behind-subject caption (intro and/or outro, when bg-removal used)
- z3: subject cut-out
.cam-out / .cam-sub (with matching data-media-start)
- z6: body captions
- z46: progress bar + flash + liquid morph wipe
-
Lint — npx hyperframes lint. Must be 0 errors. Common fixes: GSAP/CSS transform conflict on the wipe element (use xPercent/yPercent or remove the CSS transform); overlapping tweens on the same property (add overwrite: "auto").
-
Render — npx hyperframes render --quality standard --fps 30 --output renders/<name>_FINAL.mp4. Standard is the right delivery target — high roughly doubles render time. Verify with 6–8 spot-check frames from across the timeline before reporting done.
Vertical (9:16) output for TikTok / Reels / Shorts
When the user asks for vertical / portrait / TikTok / Reels / 9:16 output (from a 16:9 source):
- Clone the project to a sibling folder:
cp -r project/ project-vertical/.
- Replace its
index.html with references/host-template-vertical.html (1080×1920 canvas, blurred-bg backdrop with liquid blobs, the 16:9 footage as a centered horizontal strip, captions below).
- Replace its
gen_body.py with references/gen_body_vertical.py (centered pill, larger fonts, narrower max-width), then re-run it to emit compositions/components/caption-body.html.
- Drop the behind-subject cut-out + parallax sub-compositions (the cutout is aligned for 16:9; not worth re-aligning for v1). The vertical comp uses the blurred-source backdrop + blobs for atmosphere instead.
- Update
data-duration to the actual video duration. Update the brand-chip text in index.html (YUV.AI by default).
- Lint + render — same commands. Output is
1080×1920. Drop straight onto TikTok / IG Reels / YT Shorts.
To deliver both 16:9 and 9:16 in one go, run two render commands (in parallel projects). The transcript_review.txt approval applies to both — same captions, two compositions.
Critical rules
- Never render the final without explicit transcript approval. The review step is the whole point.
- For Hebrew:
large-v3 + language="he" + direction: rtl + Rubik (700 + 900 for editorial dual-weight emphasis).
- Caption pills always need an opaque dark backing — bare light text vanishes on white app UI.
- Centre caption pills horizontally but shift the centre x-coord left (e.g.
left: 720px) when the footage has a bottom-right webcam PiP.
- The behind-subject cut-out clip MUST carry
data-media-start matching its data-start (or matching the offset from the source if the clip was extracted), or the cut-out plays from frame 0 and desyncs.
- The
remove-background webm keeps the original RGB and writes only the alpha mask — ffprobe reports yuv420p, which looks like "no alpha". Confirm via TAG:ALPHA_MODE=1 or composite over a solid colour.
- Outro/end cards with burned-in text — do NOT caption over them; they collide.
File references
| File | Purpose |
|---|
transcript-editor/index.html | Interactive browser editor — video preview, RTL editing, dictionary apply, optional WebLLM AI suggestions, saves transcript_review.txt |
references/setup.md | Prerequisites + install commands for Node / Python / FFmpeg / faster-whisper |
references/transcribe.py | faster-whisper transcribe with CPU fallback + word timestamps |
references/serve_review.py | Local review server — auto-loads editor, blocks until user clicks Approve & Render, then writes transcript_review.txt and exits (signals the agent) |
references/make_review.py | Apply corrections + emit transcript_review.txt (file-mode fallback) |
references/apply_review.py | Parse edited review file, redistribute word timings, update transcript.json |
references/gen_body.py | Caption-body generator (editorial + matrix in liquid-glass pills) |
references/host-template.html | 16:9 host composition with liquid effects + transition wipe |
references/host-template-vertical.html | 9:16 host (1080×1920) — TikTok / Reels / Shorts layout: blurred bg, centered 16:9 footage strip, captions below, brand chip top-right |
references/gen_body_vertical.py | Caption-body generator tuned for vertical (centered pill, larger fonts, narrower max-width) |
references/liquid-blobs.html | Full-duration drifting blob layer |
references/caption-parallax-outro.html | Behind-subject caption template (English; clone for other languages) |
references/corrections-hebrew.md | Known Hebrew Whisper mishears |
references/transcript-review-workflow.md | The pause/approve step in detail |