con un clic
analyze-video
// Full footage analysis pipeline — audio transcripts, contact sheets, and Sonnet-written summaries. Produces every artifact the cut skill reads. Orchestrated from the main thread.
// Full footage analysis pipeline — audio transcripts, contact sheets, and Sonnet-written summaries. Produces every artifact the cut skill reads. Orchestrated from the main thread.
| name | analyze-video |
| description | Full footage analysis pipeline — audio transcripts, contact sheets, and Sonnet-written summaries. Produces every artifact the cut skill reads. Orchestrated from the main thread. |
This is the main thread's playbook for the Analyze Video workflow step. Run it after library setup, before any cut work. It covers the three artifacts produced per clip: audio transcript, contact_sheet, and markdown summary. The roughcut agent reads dialogue on demand by running script_extractor.rb over the transcript JSON — no separate script artifact.
SKILL.md is the parent's dispatch brief. The sub-agent working prompt lives in agent_prompt.md — inline its contents when launching a Task agent. Don't pass SKILL.md.
transcript, etc.).library.yaml exists, schema is current — run migrations from AGENTS.md if not).libraries/settings.yaml directly for whisper_model. For library fields, read the snapshot via ruby lib/buttercut/library.rb <name> summary and pull the values you need from the JSON — don't parse library.yaml inline.Inform the user: "Library setup complete. Found [N] videos ([total size]). Starting footage analysis..."
Launch transcribe-audio Task agents. Pass these values inline in each agent's prompt:
video_path, transcript_output_dir, language_code, whisper_modeltranscript_refinement (boolean). If true, also pass the current user_context and footage_summary strings (empty strings are fine — refinement still catches nonsense-token and self-witness fixes).As each agent completes, update library.yaml with transcript (filename only, not full path):
ruby lib/buttercut/library.rb <name> complete transcript <filename> [<filename>...]
Refinement note: When transcript_refinement: true, each transcribe-audio agent reviews and corrects its transcript in place before returning, using the user_context and footage_summary the parent passed in. Empty context strings are fine. The parent still only writes transcript: <filename>.json to library.yaml after the agent completes.
Run from the project root:
ruby lib/buttercut/contact_sheet_job.rb <library-name> <clip> [<clip> ...]
Takes an explicit list of clip filenames (including extension, e.g. P1055016.MP4). Runs single-threaded — launch multiple invocations in parallel from the main thread when machine headroom allows (a 2-3 split across cores is usually safe on an M-series Mac). Always rebuilds every sheet for the clips it's given; for clips longer than 10 minutes that includes per-segment sheets covering successive 10-minute slices. Updates library.yaml's contact_sheet field for every clip it processes. No LLM — pure ffmpeg.
Dispatch analyze-video sub-agents on the Sonnet model. Sonnet reads the contact sheet with noticeably more visual specificity than Haiku (catches clothing, architecture, camera framing) — worth it since the summaries feed every later cut decision.
Batch 10 clips per sub-agent, up to 10 sub-agents in parallel, with rolling dispatch. Each sub-agent processes its 10 clips sequentially; batching amortizes the ~5–10s per-agent dispatch overhead. For a 93-clip library that's ~10 sub-agents total instead of 93. Start the next sub-agent as soon as one returns — don't wait for the whole wave of 10 to finish, or you give up ~30% of wall-clock to whichever agent in the wave is slowest.
For each sub-agent, pass a list of 10 clip records inline. Each clip record needs:
video_filename — basename of the video (used in the summary header and reply line)duration — duration string from library.yaml (e.g. 00:01:19); the agent renders it in the summary headercontact_sheet_path — absolute path to the _full.jpg (from step 2)transcript_path — absolute path to the audio transcript JSON (from step 1); the sub-agent extracts dialogue on demand via script_extractor.rbsummary_output_path — absolute path where the agent should write the summary markdownAs each sub-agent returns its batch, update library.yaml with summary for every clip in that batch:
ruby lib/buttercut/library.rb <name> complete summary <filename> [<filename>...]
The contact_sheet field was already populated in step 2, so the sub-agent return only contributes summaries.
If a sub-agent returns summaries inline instead of writing them to disk (sometimes Sonnet hallucinates "the Write tool is blocked" and dumps the markdown into its reply), don't retry blindly — just extract each summary from the agent's response and Write it to the matching summary_output_path from the parent thread. Then run the complete summary command as usual. Faster than redispatching, and the content is already there.
(Per-segment contact sheets generated for long clips live alongside the _full sheet on disk and are discoverable by convention — they aren't listed in library.yaml.)
Once every summary is written, talk through what the footage actually shows — confirm character names, locations, the narrative through-line, any stray or off-thesis clips, and the user's creative intent for this library. Use plain conversation; only reach for AskUserQuestion when offering a discrete choice. As you learn things, update:
footage_summary and user_context via ruby lib/buttercut/library.rb <name> update_metadata footage_summary "..." (and the same with user_context)summary_*.md files when a summary mislabels someone or misses a key detail (e.g., "a man in a tan jacket" → the user's name)This is the one place to do this thorough pass. Every later roughcut planning run inherits the resulting context rather than re-interrogating the library.
After all analysis completes, automatically create a backup using the backup-library skill.
Used in steps 1 and 3.
Parent agent responsibilities:
library.yaml and settings.yaml once to gather all values needed by sub-agents.Library API — see AGENTS.md).Child agent responsibilities:
script_extractor.rb, and write the summary markdown in one Write call (analyze-video).Each skill's agent_prompt.md documents its own IO contract — including whether the sub-agent reads or writes library.yaml. (Spoiler: it never writes library.yaml. Only the parent writes, via the Library API.)
Warn: "I can create a rough cut now, but I'll do a better job after analyzing all the footage. Continue anyway?" If the user confirms, proceed. Otherwise, wait for analysis to complete.
Skill for processing footage (video clips, sounds, photos, etc). Use this when creating a new library, adding new footage (videos) to an existing library, or resuming processing on an existing library.
Build a cut from a library — scene, selects, roughcut, or custom task. Starts by asking what kind of cut the user wants, then works with them to determine what they want to create. Always exports a file for Final Cut, Premiere, or Resolve at the end. Use when the user asks for a "roughcut", "sequence", "scene", "selects", or any other cut-shaped output.
Backs up user libraries and all their contents (external video excluded). This skill can also be useful when you need to restore a library.
Builds a contact sheet from a video clip — evenly spaced frames laid out in a single grid image, each with its hh:mm:ss timestamp burned in. Use when the user asks for a "contact sheet", "grid", "film strip", or wants a one-image overview of part of a clip.
Exports all dialogue from every clip in a library into a single text file. One clip per block — filename, then its spoken words. Use when the user asks for a "full transcript", "full script", or wants all the dialogue from a library in one place.
Reset a library's visual analysis (contact sheets, summaries, legacy visual_transcripts) and re-run the current analyze-video pipeline on it. Keeps audio transcripts, cuts, plans, and library metadata. Use when a library was processed under the older pipeline and the user wants to bring it onto the contact-sheet-based one.