ワンクリックで
analyze-video
// Full footage analysis pipeline — audio transcripts, contact sheets, and Sonnet-written summaries. Produces every artifact the cut skill reads. Orchestrated from the main thread.
// Full footage analysis pipeline — audio transcripts, contact sheets, and Sonnet-written summaries. Produces every artifact the cut skill reads. Orchestrated from the main thread.
Skill for processing footage (video clips, sounds, photos, etc). Use this when creating a new library, adding new footage (videos) to an existing library, or resuming processing on an existing library.
Build a cut from a library — scene, selects, roughcut, or custom task. Starts by asking what kind of cut the user wants, then works with them to determine what they want to create. Always exports a file for Final Cut, Premiere, or Resolve at the end. Use when the user asks for a "roughcut", "sequence", "scene", "selects", or any other cut-shaped output.
Backs up user libraries and all their contents (external video excluded). This skill can also be useful when you need to restore a library.
Builds a contact sheet from a video clip — evenly spaced frames laid out in a single grid image, each with its hh:mm:ss timestamp burned in. Use when the user asks for a "contact sheet", "grid", "film strip", or wants a one-image overview of part of a clip.
Exports all dialogue from every clip in a library into a single text file. One clip per block — filename, then its spoken words. Use when the user asks for a "full transcript", "full script", or wants all the dialogue from a library in one place.
Reset a library's visual analysis (contact sheets, summaries, legacy visual_transcripts) and re-run the current analyze-video pipeline on it. Keeps audio transcripts, cuts, plans, and library metadata. Use when a library was processed under the older pipeline and the user wants to bring it onto the contact-sheet-based one.
| name | analyze-video |
| description | Full footage analysis pipeline — audio transcripts, contact sheets, and Sonnet-written summaries. Produces every artifact the cut skill reads. Orchestrated from the main thread. |
This is the main thread's playbook for the Analyze Video workflow step. Run it after library setup, before any cut work. It covers the three artifacts produced per clip: audio transcript, contact_sheet, and markdown summary. The roughcut agent reads dialogue on demand by running script_extractor.rb over the transcript JSON — no separate script artifact.
SKILL.md is the parent's dispatch brief. The sub-agent working prompt lives in agent_prompt.md — inline its contents when launching a Task agent. Don't pass SKILL.md.
transcript, etc.).library.yaml exists, schema is current — run migrations from AGENTS.md if not).libraries/settings.yaml directly for whisper_model. For library fields, read the snapshot via ruby lib/buttercut/library.rb <name> summary and pull the values you need from the JSON — don't parse library.yaml inline.Inform the user: "Library setup complete. Found [N] videos ([total size]). Starting footage analysis..."
Launch transcribe-audio Task agents. Pass these values inline in each agent's prompt:
video_path, transcript_output_dir, language_code, whisper_modeltranscript_refinement (boolean). If true, also pass the current user_context and footage_summary strings (empty strings are fine — refinement still catches nonsense-token and self-witness fixes).As each agent completes, update library.yaml with transcript (filename only, not full path):
ruby lib/buttercut/library.rb <name> complete transcript <filename> [<filename>...]
Refinement note: When transcript_refinement: true, each transcribe-audio agent reviews and corrects its transcript in place before returning, using the user_context and footage_summary the parent passed in. Empty context strings are fine. The parent still only writes transcript: <filename>.json to library.yaml after the agent completes.
Run from the project root:
ruby lib/buttercut/contact_sheet_job.rb <library-name> <clip> [<clip> ...]
Takes an explicit list of clip filenames (including extension, e.g. P1055016.MP4). Runs single-threaded — launch multiple invocations in parallel from the main thread when machine headroom allows (a 2-3 split across cores is usually safe on an M-series Mac). Always rebuilds every sheet for the clips it's given; for clips longer than 10 minutes that includes per-segment sheets covering successive 10-minute slices. Updates library.yaml's contact_sheet field for every clip it processes. No LLM — pure ffmpeg.
Dispatch analyze-video sub-agents on the Sonnet model. Sonnet reads the contact sheet with noticeably more visual specificity than Haiku (catches clothing, architecture, camera framing) — worth it since the summaries feed every later cut decision.
Batch 10 clips per sub-agent, up to 10 sub-agents in parallel, with rolling dispatch. Each sub-agent processes its 10 clips sequentially; batching amortizes the ~5–10s per-agent dispatch overhead. For a 93-clip library that's ~10 sub-agents total instead of 93. Start the next sub-agent as soon as one returns — don't wait for the whole wave of 10 to finish, or you give up ~30% of wall-clock to whichever agent in the wave is slowest.
For each sub-agent, pass a list of 10 clip records inline. Each clip record needs:
video_filename — basename of the video (used in the summary header and reply line)duration — duration string from library.yaml (e.g. 00:01:19); the agent renders it in the summary headercontact_sheet_path — absolute path to the _full.jpg (from step 2)transcript_path — absolute path to the audio transcript JSON (from step 1); the sub-agent extracts dialogue on demand via script_extractor.rbsummary_output_path — absolute path where the agent should write the summary markdownAs each sub-agent returns its batch, update library.yaml with summary for every clip in that batch:
ruby lib/buttercut/library.rb <name> complete summary <filename> [<filename>...]
The contact_sheet field was already populated in step 2, so the sub-agent return only contributes summaries.
If a sub-agent returns summaries inline instead of writing them to disk (sometimes Sonnet hallucinates "the Write tool is blocked" and dumps the markdown into its reply), don't retry blindly — just extract each summary from the agent's response and Write it to the matching summary_output_path from the parent thread. Then run the complete summary command as usual. Faster than redispatching, and the content is already there.
(Per-segment contact sheets generated for long clips live alongside the _full sheet on disk and are discoverable by convention — they aren't listed in library.yaml.)
Once every summary is written, talk through what the footage actually shows — confirm character names, locations, the narrative through-line, any stray or off-thesis clips, and the user's creative intent for this library. Use plain conversation; only reach for AskUserQuestion when offering a discrete choice. As you learn things, update:
footage_summary and user_context via ruby lib/buttercut/library.rb <name> update_metadata footage_summary "..." (and the same with user_context)summary_*.md files when a summary mislabels someone or misses a key detail (e.g., "a man in a tan jacket" → the user's name)This is the one place to do this thorough pass. Every later roughcut planning run inherits the resulting context rather than re-interrogating the library.
After all analysis completes, automatically create a backup using the backup-library skill.
Used in steps 1 and 3.
Parent agent responsibilities:
library.yaml and settings.yaml once to gather all values needed by sub-agents.Library API — see AGENTS.md).Child agent responsibilities:
script_extractor.rb, and write the summary markdown in one Write call (analyze-video).Each skill's agent_prompt.md documents its own IO contract — including whether the sub-agent reads or writes library.yaml. (Spoiler: it never writes library.yaml. Only the parent writes, via the Library API.)
Warn: "I can create a rough cut now, but I'll do a better job after analyzing all the footage. Continue anyway?" If the user confirms, proceed. Otherwise, wait for analysis to complete.