with one click
cut-shortform
// Shape a 30-90 second editorial cut from a longer interview/talk using a word-timestamped transcript.
// Shape a 30-90 second editorial cut from a longer interview/talk using a word-timestamped transcript.
Burn TikTok-style word-grouped captions over a video using whisper word timestamps and the ASS subtitle format.
ffmpeg recipes for rendering the final cut from raw.mp4 + cuts.json, plus encoder settings and validation.
Produce a word-level timestamped transcript from a video using OpenAI whisper-1.
| name | cut-shortform |
| description | Shape a 30-90 second editorial cut from a longer interview/talk using a word-timestamped transcript. |
| when-to-use | When the user asks for a clip, hook, shortform, reel, or wants a long video tightened. |
The single editorial principle: one coherent argument, source order preserved. Stitched-together montages of disconnected good lines feel random; the goal is a clip that earns the next watch by escalating one idea.
transcript.json — required. Read segments first, then go to words
for cut boundaries.Skim the segments. Identify 3-7 candidate moments where the speaker says something concrete. Mark the segment id and rough time.
Pick the spine. From the candidates, choose ONE through-line: cold-open hook → catalyst → escalation → payoff. Discard candidates that don't serve the spine — even if they're individually strong.
Lock cut points to word boundaries from the words array. The
start of a span = the start of the first kept word; the end = the
end of the last kept word.
Write cuts.json before rendering. Schema:
{
"title": "Brief title for the clip",
"target_duration_seconds": "~30 (28-32 window)",
"keep_spans": [
{ "start": 35.04, "end": 39.26, "reason": "Cold open on …" }
],
"cut_policy": [
"Boundaries on whisper word timestamps, no reordering.",
"Removed dead air, hedges, mid-sentence restarts."
]
}
Render via the render-final skill.
If the user asks for the original video to continue after the hook
("then add the main video back," "play the rest after," "append the
source"), record it in cuts.json as a tail block:
{
"title": "...",
"keep_spans": [ ... ],
"tail": {
"from": 98.14,
"to": null,
"music": false
},
"cut_policy": [ ... ]
}
from is a source-time in seconds. Most useful: the end of the
LAST keep-span, so the body picks up exactly where the hook left
off (no jump).to of null means run to the end of the source.music: false means the music ducks out before the body starts.Render order: keep-spans first, then the tail, concatenated. See
render-final/SKILL.md for the filter pattern.
If tail is absent, render only the hook (default behavior).
| Target | Bias |
|---|---|
| ≤30s | Aggressive. One argument. No subordinate clauses unless they pay off the punch. Cut filler ruthlessly. |
| 60s | Setup → tension → payoff. Some breathing room. Tolerate one clause that earns the next line. |
| 90s+ | Looser pacing, narrative arc, more context preserved. Still no reordering. |