원클릭으로 Manus에서 모든 스킬 실행

video-composer

스타0

포크0

업데이트2026년 6월 12일 18:10

Orchestrate transcript-to-video composition for this Remotion/Hyperframes vlog system. Use when analyzing an input recording or transcript, splitting it into shots, defining the global visual spec, deciding which assets can be generated in parallel, and producing composer artifacts for downstream shot, motion, presenter, generated-video, caption, Remotion, or QC skills.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

blueif16

blueif16/vlog_test

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

파일 탐색기

4 개 파일

SKILL.md

readonly

이 저장소의 다른 Skills

같은 저장소

apple-keynote-motion

blueif16/vlog_test

Define Apple-Keynote-style motion grammar for educational videos. Use when selecting builds, handoffs, object continuity, stage transitions, framed content motion, typography reveals, or smooth presenter-to-canvas movement for the Remotion/Hyperframes vlog pipeline.

2026-06-120

generated-video-director

blueif16/vlog_test

Specify inserted generated-video clips for the composer pipeline. Use when a transcript section needs a generated visual metaphor, illustrative clip, b-roll, product-style animation, or cinematic insert that must match the global style and neighboring shots.

2026-06-120

narration-slide-division

blueif16/vlog_test

Split content between the voice and the slide so they complement instead of repeat. Use at node 03 (script / cue-plan) AND node 04 (visual / slide copy) when authoring any beat that has both narration and on-screen text — the anti-duplication contract that keeps the voice from reading the slide aloud.

2026-06-120

render-lab-design-system

blueif16/vlog_test

Use when a recorded-video composition should use the Render Lab design system: production-control-room visuals, browser/render/QC instruments, frame-driven Remotion motion, and semantic components instead of generic cards or decorative UI.

2026-06-120

render-qc

blueif16/vlog_test

Quality-check rendered video outputs, Hyperframes atoms, Remotion previews, and composer artifacts for this project. Use after creating or changing any atom, generated clip, render plan, timeline, caption layout, presenter framing, or final MP4.

2026-06-120

shot-brief-writer

blueif16/vlog_test

Convert video-composer decisions into precise shot briefs for sub-agents or sub-nodes. Use when a section needs a self-contained spec containing global style, transcript excerpt, neighboring shots, dependencies, asset requirements, presenter/content layout, and expected output.

2026-06-120

name	video-composer
description	Orchestrate transcript-to-video composition for this Remotion/Hyperframes vlog system. Use when analyzing an input recording or transcript, splitting it into shots, defining the global visual spec, deciding which assets can be generated in parallel, and producing composer artifacts for downstream shot, motion, presenter, generated-video, caption, Remotion, or QC skills.

Video Composer

Overview

Own the whole video before any specialist writes a clip. Read the transcript and source constraints, decide the narrative structure, define the shared production spec, and emit compact artifacts that downstream nodes can execute without inventing a different style.

For any new recorded-video job, first read VIDEO_GENERATION_WORKFLOW.md at the repo root. Treat it as the canonical entry contract for content preparation, inbox handling, timing, composition, build, QC, and feedback updates.

Inputs

Transcript with timestamps, preferably word-level.
New user recordings may be dropped into shared/inbox/ with arbitrary filenames. Pick the newest plausible recording unless there are multiple new candidates; copy the selected file into shared/recordings/ under the lesson id before composing.
Source video metadata: duration, resolution, fps, audio notes, and presenter framing.
Timing source: word-level ASR, phrase-level cue map, explicit manual cue timestamps, or audio-pause alignment when ASR is unavailable. For this workstation, prefer the Omniscience Sherpa-ONNX ASR model at /Users/tk/Desktop/Omniscience/agent/models/voice/asr via npm run asr:recording before using pause-only timing. Do not treat rough script timestamps as final timing for a real recording.
Available component catalog, if present.
User goal, audience, subject, and target style.

Core Rule

Think in detail; output only what the pipeline needs. Do not push every editorial thought into JSON fields. Use internal judgment to choose shots, assets, and transitions, then emit a small set of production artifacts.

Workflow

Identify the thesis: write one sentence for what the whole video teaches or proves.
Verify timing before shot splitting. If a real recording is present and only rough script timestamps exist, run LESSON_ID=<id> npm run asr:recording when the Omniscience ASR runtime exists; otherwise use ffmpeg silence/pause boundaries to create an audio-derived approximation and mark the confidence.
Split the transcript by idea responsibility, not sentence boundaries.
Assign each shot a role: setup, handoff, proof, demonstration, generated insert, detail zoom, recap, memory hook, or transition bridge.
Select the active design system before defining style. If .agents/design-systems/registry.yaml exists, read it, load the active design-system skill, and combine it with the base skills. Timing review stays with the global ASR timing skill, not the design system. Disabled design systems stay unused unless the user explicitly switches.
Define the global production spec before assigning assets. Include design-system id, type scale, color, component catalog, motion defaults, video-frame styling, presenter-safe zones, reserved content rectangles, and continuity rules.
Run a layout-fit pass before any atom is assigned. Inventory content count, text length, safe zones, presenter dock, and minimum readable sizes; choose the simplest layout that fits. Do not force side-by-side layouts when a top-down, single-focus, or sequential build would avoid collisions.
Decide the human/content relationship for each shot. Use the presenter when trust, energy, or direct explanation matters. Let the content canvas dominate when the viewer needs to inspect or understand visual material.
Select creator effects only after the layout and presenter/content relationship are clear. Record purpose, owner, cue source, safe zones, audio/caption budget, fallback, and QC targets for every creator effect; reject effects whose only reason is style or energy.
Create a caption and widget budget. Burned-in captions are rare emphasis, not transcript; widgets must orient, signal, compare, transform, inspect, or recap without duplicating captions.
Create an asset dependency graph. Mark independent nodes parallel; mark transitions or generated clips that depend on neighboring shots as linear. Prefer no custom transition atom unless the transition itself teaches the relationship between shots.
Emit shot briefs for downstream specialists. Each brief must include the global spec, neighboring shot summaries, layout decision, creator effect contract when applicable, input/output expectations, and dependency constraints.
Emit a final assembly intent after all shot assets are specified. Keep final render JSON small and component-oriented.

Transition grammars

The four frame-driven transition grammars are catalog capabilities in the registry under transitionGrammars (shared/capabilities/video-production-registry.json). Discover them there, then select one per seam via a top-level transitions[] array on the plan — promo plans today, lesson plans later — with the shape {at, grammar, frames}, where at is the outgoing shot's index and an absent seam means a hard cut. They compose through Remotion <TransitionSeries>, which owns the overlap timing, and are content-agnostic: any grammar can sit between any two scenes.

Select by what the seam must teach (text quoted from the registry — keep this table in lockstep with transitionGrammars):

grammar	useWhen	avoidWhen
`blur-reveal`	The universal scene-in: the incoming scene comes into focus (blur high to 0) while the outgoing blurs and fades out.	The seam wants energy or motion continuity — a push-through or match-cut reads stronger than a soft dissolve.
`push-through`	A 2D camera-push-through: zoom in until the outgoing layer blurs and scales out, then the incoming scene emerges through it (pairs with a 3D pushIn move).	The beat is a quiet hold-to-hold — the aggressive zoom over-dramatizes a calm transition.
`match-cut`	Two clips meet at a fixed peak-velocity frame: the outgoing accelerates to a peak at the seam, hard cut, the incoming continues from that velocity and decelerates.	The two scenes share no continuous motion to match — the hard cut just reads as a jump.
`luma-wipe`	An AE luma-matte look: a frame-driven soft-edged white-on-black mask sweeps across the outgoing layer to reveal the incoming underneath.	A hard cut or a focus-pull reads better — a feathered wipe can feel ornamental on a fast beat.

Typical frames: ~14–20 frames (≈0.5–0.7s at 30fps). match-cut tends shorter (the hard cut is the trick); blur-reveal can run a touch longer. The validator warns if frames would consume more than a neighbor shot's whole length.

A transition must TEACH the relationship between shots — energy, motion continuity, or a reveal — never decoration. If no grammar's useWhen fits the beat (or its avoidWhen applies), leave the seam a hard cut.

Required Outputs

composition-brief.md: thesis, audience, structure, pacing, and visual motif.
global-style-spec.json: shared visual constraints all nodes must ingest.
section-map.json: shot list with role, transcript range, visual responsibility, and neighboring context.
asset-jobs.json: asset generation jobs and dependency graph.
transition-map.json: transitions between or inside shots, including which transitions depend on actual rendered assets.
final-render-plan.json: compact Remotion-facing plan after assets are ready.

Also record a concise caption/widget policy in the style spec or brief: frequency budget, reason test, and duplication rule.

Also record a concise layout policy in the style spec or brief: chosen template, why it fits the content, safe zones, minimum readable dimensions, and which decorative details are allowed only after fit is proven.

Also record a concise creator-effect policy in the style spec or brief: allowed effect purposes, owner defaults, cue timing policy, audio/caption budgets, fallback rule, and QC targets. Use shared/research/2026-creator-effects/composer/effect-selection-rubric.md as the current rubric.

For recorded input, also emit timing-cues.json or include an equivalent cue block: timing source, cue phrase, timestamp, confidence, and whether it is ASR-derived, manual, or provisional.

If the recording is Chinese or the ASR cue confidence is low, use the global ASR timing review skill to emit a human-review transcript ledger before final timing is accepted. Programmatic cue matching is not sufficient by itself.

For presenter-aware gesture videos, every render plan must include a debugging boolean on the overlay segment. debugging: true shows the MediaPipe face, hand, fingertip, and active-gesture tracker layer. debugging: false still runs the tracker and uses it to drive the effect, but hides diagnostic boxes and shows the polished staged animation.

For gesture-triggered UI, cue timing only opens the reaction window. The detected hand box, fingertip, or palm box chooses the event, direction, and reveal progress. The final visual must then stabilize into an intentional composition: a pulled canvas sheet, a hand-swept rail or pushed-in image, a freeze gate, or a detail zoom. Do not accept cards that jitter by following raw bounding boxes every frame.

See references/artifacts.md for artifact shapes and references/global-style-spec.md for the current Apple-Keynote-style baseline.

Delegation Rules

Use shot-brief-writer to convert composer decisions into exact sub-node specs.
Use presenter-framing when deciding where the human should appear and how large.
Use apple-keynote-motion for motion continuity and build style.
Use hyperframes-atom-author for HTML/GSAP atoms.
Use generated-video-director for inserted generated-video clips.
Use caption-and-emphasis for text overlays and emphasis.
Use remotion-composer only after assets and shot briefs are ready.
Use render-qc after any render or visual preview.

Splitting Guidance

A shot may contain internal motion and multiple layers. Do not split every sentence into a shot. Split when the visual responsibility changes:

presenter leads
presenter hands off to content
content becomes inspectable
generated video carries the idea
a transition itself explains the relationship
recap returns to the presenter or a clean memory hook

Transitions may be between shots, inside a shot, or their own shot. Choose based on meaning, not convenience.

Dependency Guidance

Start all independent asset jobs together. Keep a job linear only when it needs previous output, such as:

a match transition that depends on the outgoing and incoming frame geometry
a generated clip that must inherit colors, camera, or object position from a prior shot
a final assembly that needs actual rendered atom durations
QC that must inspect completed media

Prefer a DAG over a flat list when planning work.