Run any Skill in Manus with one click

Get Started

voice-video-producer

Stars1

Forks1

UpdatedJune 5, 2026 at 07:36

Turn the approved manga visual package into speech, subtitles, and a final motion-comic video.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

AutoByteus

AutoByteus/autobyteus-agents

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Film and Video EditorsArts, Design, Entertainment, Sports, and Media Occupations·SOC 27-4032

File Explorer

6 files

SKILL.md

readonly

Voice And Video Producer

Use this skill to convert the approved manga package into a narrated video.

Expected Inputs

series-bible.md
character-bible.md
character-registry.md
series-state.md
chapter-registry.md
chapters/<chapter-id>/chapter-plan.md
chapters/<chapter-id>/storyboard.md
chapters/<chapter-id>/continuity-ledger.md
chapters/<chapter-id>/visual-style-guide.md
chapters/<chapter-id>/prompt-pack.md
chapters/<chapter-id>/image-generation-log.md
chapters/<chapter-id>/visual-production-package.md
generated character sheets
generated video-frame assets for video delivery, or page/panel assets only when an explicit non-default contract produced them

storyboard.md is the operative source of truth for clip structure and on-screen story beats. chapter-plan.md is only supporting scope context once the storyboard exists.

Produced Artifacts

chapters/<chapter-id>/voice-package.md
chapters/<chapter-id>/audio-generation-log.md
chapters/<chapter-id>/audio/clip01.*, chapters/<chapter-id>/audio/clip02.*, ...
chapters/<chapter-id>/subtitles.srt
chapters/<chapter-id>/video-package.md
chapters/<chapter-id>/chapter-carry-forward.md
final video such as chapters/<chapter-id>/final/manga-video.mp4

Use:

Required Shared Reads

Start by reading production-principles.md.
Use it as the shared reference for motion-comic realism, subtitle fidelity, and voice continuity.

Workflow

Step 1 - Create `chapters/<chapter-id>/voice-package.md`

Decide whether the final piece is:

narrator-led
dialogue-led
hybrid

Before locking the chapter voice map, inspect the live supported models and voices when the runtime exposes them. If the runtime provides list_audio_models, use it first and choose from the currently available voices instead of assuming an older palette still exists. Start from the current voice mappings recorded in character-registry.md for recurring characters. Use the stable voice persona or delivery-style anchors from character-bible.md to keep casting and performance behavior coherent even if the runtime voice id changes. If a preferred recurring-character voice is unavailable in the current runtime, choose the nearest supported replacement and record that change explicitly for carry-forward sync. For this team's generate_speech workflow, multi-speaker means at most two mapped speakers per call. For manga-style motion comics, the working default is one visible beat, one speaker, and one short clip.

Then map the storyboard into audio clips. Use the storyboard audio beats as the primary segmentation source instead of inventing clip boundaries ad hoc at this stage.

For each clip, record:

clip id
source storyboard audio beat ids
source scene and render-unit ids
clip generation mode
distinct speakers in clip
speaker or narrator
target voice
voice-style anchor ref when relevant
script
emotion
performance directions or stage cues
subtitle text
estimated seconds
static-hold risk

Do not invent story action that the viewer cannot see or infer from the canon package. Do not build the voice package from chapter-plan.md if the storyboard already says something more precise. Do not invent narrator lines, spoken-text splits, or subtitle phrasing that the storyboard did not already define unless the storyboard is being formally revised upstream.

Step 2 - Log every speech-tool call in `chapters/<chapter-id>/audio-generation-log.md`

For every material speech-generation call, record:

clip id
source storyboard audio beat ids
source scene and render-unit ids
generation mode
distinct speakers in call
exact spoken text
performance directions or stage cues
speech tool used
model identifier returned by the tool, if available
voice
language
speech settings used, when applicable
prompt-level performance directions when applicable
output path
approval status
sequential-call position and whether sleep 60 was completed before the next speech-tool call
notes for reuse

Step 3 - Generate the speech assets

Generate one speech clip per audio beat with the approved speech tool for the run, such as generate_speech or speak.

Speech generation must be serial-only. Treat the clip list as a queue, not a batch:

call exactly one generate_speech or selected speak request
wait for the tool result before doing anything that depends on that clip
inspect and log the result in audio-generation-log.md
immediately run sleep 60 before making any further speech-tool call
do not launch multiple speech-tool calls concurrently, in the background, through a background process, or as a parallel batch
apply the same 60-second cooldown after retries, rejected candidates, timed-out calls, failed calls, and approved clips

For each clip:

keep a stable voice assignment per narrator or character
default to one speaker per clip
do not use multi-speaker generation just because multiple characters exist in the same scene
use multi-speaker generation only when one visible beat genuinely requires a short two-person exchange inside the same audio unit
keep multi-speaker clips to two mapped speakers maximum
if more than two distinct speakers would be needed, split the exchange into separate clips or staged one-speaker / two-speaker segments and log the assembly explicitly
if one unchanged render unit would end up carrying roughly more than 8 to 10 seconds of uninterrupted speech or more than one clear speaker turn, split the clip or route a storyboard revision instead of shipping a sluggish static hold
preserve ordering in filenames
record actual durations in voice-package.md
prefer short, precise clips over one giant track
if one tool path fails, a segmented fallback is acceptable, but the final clip assembly must still be logged explicitly

Speech prompt rules:

Use short audible stage directions when they materially improve delivery.
Bracketed cues such as [pause], [laughs softly], [under breath], or [more controlled now] are valid when the speech tool supports expressive prompting.
Only include directions the listener can actually hear. Do not use visual-only notes such as [camera pushes in].
Keep the subtitle text separate from the stage directions unless the character literally says those words aloud.
Do not overload every line with cues. One to three cues per clip is usually enough.
When the tool supports voice or style config, use both:
- keep the spoken script readable
- use inline cues for local performance changes
- use config or style instructions for the broader vocal target

Examples:

Single-speaker spoken prompt:
[calm, dangerous] 我最不需要的是只会点头的人。 [short pause] 你继续待在原来的位置。 [lower voice] 我要你看清楚，谁被允许进入同一个房间。

Matching subtitle text:
我最不需要的是只会点头的人。你继续待在原来的位置。我要你看清楚，谁被允许进入同一个房间。

Multi-speaker spoken prompt only when one visible beat truly needs it:
Kurapika: [quiet, controlled] 下一位。
Borksen: [after a short pause] ……我是来参加讲习的。

Matching subtitle text:
下一位。……我是来参加讲习的。

Preferred split for most manga motion-comic beats:
- Clip A
Kurapika: [quiet, controlled] 下一位。

- Clip B
Borksen: [after a short pause] ……我是来参加讲习的。

If three voices would otherwise collide in one beat:
- split narrator into its own clip, or
- split the exchange into sequential two-speaker / one-speaker clips
instead of sending one three-speaker mapping request.

Step 4 - Build subtitles and timing

Create subtitles.srt from the final clip order and actual durations.
Keep subtitle wording faithful to the spoken line unless the user explicitly wants subtitle condensation.
Add small readability buffers, but keep subtitle timing tightly aligned to the audio.
Read visual-production-package.md before final subtitle burn and respect its declared asset lettering state and subtitle overlay guidance.
Do not burn redundant subtitles over assets that are already fully lettered unless the package or the user explicitly calls for that.
If the storyboard provides a preferred subtitle-window plan or line-break hints, treat that as the default authority instead of silently re-segmenting from scratch.
Choose subtitle layout for the locked series export aspect ratio instead of reusing one default style blindly.
For portrait or vertical video:
- start with a smaller subtitle font than landscape
- keep the subtitle block visually anchored near the bottom
- prefer a two-line maximum for Chinese in normal cases
For landscape or horizontal video:
- a slightly larger font and a slightly higher bottom margin are acceptable
If a subtitle wraps to three lines in a portrait frame, do at least one of:
- split it into two subtitle windows
- reduce font size modestly
- rewrite line breaks more intelligently
If you must deviate from the storyboard's preferred subtitle windows for readability, record that deviation explicitly in video-package.md.
Record the chosen subtitle layout settings for the final export, including font size and bottom margin.
Build a render timing map before export that assigns each video-frame render unit, or each page/panel render unit only when an explicit non-default contract produced them:
- start time
- end time
- source asset path
- linked storyboard audio beat ids
- linked subtitle ids or subtitle window
If one audio beat spans multiple render units, make that split explicit in the timing map instead of leaving it implicit in the editor state.

Step 5 - Assemble the final video

Use ffmpeg for:

image sequencing
simple pans or zooms when useful
light transitions when useful
audio muxing
subtitle burn-in or sidecar packaging
final export normalization

Default output should feel like a clean motion comic, not fake full animation. When sync precision matters, prefer deterministic per-render-unit assembly over a raw slideshow concat path. Normalize still images before concatenation so mixed asset dimensions do not silently distort timing or ordering. If the visual package declares video-frame, confirm the assets are already full-bleed single-frame images at the locked series aspect ratio before assembly. Do not hide white paper margins, page gutters, collage layouts, or source art that mismatches the locked series aspect ratio with casual padding. Route those assets back to manga_illustrator unless the user explicitly approved a separate export variant with a full crop/regeneration plan.

Step 5A - Run post-export QA on the final file

Do not stop at a successful export command.

After the final MP4 is written:

sample validation frames from the exported file itself
for video-frame chapters with 30 render units or fewer, sample at least one midpoint timestamp per render unit
for panel-first chapters with 30 render units or fewer, sample at least one midpoint timestamp per render unit
for larger chapters, sample the first, middle, and last render units of each scene plus every scene boundary
compare the sampled frames against the intended render timing map
spot-check subtitle progression around audio-beat boundaries so the visible frame/panel and subtitle sequence stay aligned
spot-check subtitle block size and vertical placement on the final file for the locked series aspect ratio

If the sampled frames do not match the intended assets, or the subtitle progression is visibly out of sync with the frame/panel progression, or the subtitle block is unreasonably large or too high for the frame, the export is invalid and must be rebuilt.

Step 6 - Write `chapters/<chapter-id>/video-package.md`

Record:

final video path
export resolution and frame rate
locked series aspect ratio
audio manifest
subtitle path
storyboard-to-asset mapping ref
image-asset order
source-asset frame conformance result when the render-unit contract is video-frame
timing summary
render timing map
subtitle layout settings
any storyboard subtitle-window deviations made for readability
any static-hold pacing interventions or storyboard deviations made to avoid a sluggish single-image hold
exact ffmpeg commands used
post-export QA method and results
explicit delivery gate status
residual risks or known limitations

Step 7 - Write `chapters/<chapter-id>/chapter-carry-forward.md`

Capture the durable handoff for the next chapter:

new canon confirmed in this chapter
new character debuts and status changes
reusable image assets and prompt blocks
reusable voice choices, voice-mapping updates, and pronunciation notes
open hooks for the next chapter
final output paths

Then route the full package back to manga_showrunner for series-level canon sync.

Blocking Rules

If the storyboard does not support clear audio mapping, route the issue back to storyboard_director.
If the generated images are too inconsistent to cut into a coherent video, route that issue back to manga_illustrator.
If video delivery uses video-frame and any approved asset looks like a manga page, panel grid, white paper sheet, collage, reference sheet, or wrong-aspect frame without an explicit crop plan, route it back to manga_illustrator.
If the visual package does not make the lettering state or subtitle-overlay expectation explicit, route that issue back to manga_illustrator before final burn decisions.
Do not send a generate_speech multi-speaker request with more than two mapped speakers. Split it first.
Do not ship one long clip that leaves the viewer on one unchanged image through an extended exchange when the pacing would feel static. Split it or send it back.
Do not compensate for story gaps with filler narration.
Do not add music unless the user supplied it or explicitly requested it.

Producer Standards

Voice continuity matters.
Timing should be audio-led, with the images paced around the real clip durations.
Favor clarity and emotional readability over flashy editing.
Export verification is part of delivery, not optional cleanup.

name	voice-video-producer
description	Turn the approved manga visual package into speech, subtitles, and a final motion-comic video.

voice-video-producer

More from this repository

Voice And Video Producer

Expected Inputs

Produced Artifacts

Required Shared Reads

Workflow

Step 1 - Create chapters/<chapter-id>/voice-package.md

Step 2 - Log every speech-tool call in chapters/<chapter-id>/audio-generation-log.md

Step 3 - Generate the speech assets

Step 4 - Build subtitles and timing

Step 5 - Assemble the final video

Step 5A - Run post-export QA on the final file

Step 6 - Write chapters/<chapter-id>/video-package.md

Step 7 - Write chapters/<chapter-id>/chapter-carry-forward.md

Blocking Rules

Producer Standards

Voice And Video Producer

Expected Inputs

Produced Artifacts

Required Shared Reads

Workflow

Step 1 - Create chapters/<chapter-id>/voice-package.md

Step 2 - Log every speech-tool call in chapters/<chapter-id>/audio-generation-log.md

Step 3 - Generate the speech assets

Step 4 - Build subtitles and timing

Step 5 - Assemble the final video

Step 5A - Run post-export QA on the final file

Step 6 - Write chapters/<chapter-id>/video-package.md

Step 7 - Write chapters/<chapter-id>/chapter-carry-forward.md

Blocking Rules

Producer Standards

More from this repository

Step 1 - Create `chapters/<chapter-id>/voice-package.md`

Step 2 - Log every speech-tool call in `chapters/<chapter-id>/audio-generation-log.md`

Step 6 - Write `chapters/<chapter-id>/video-package.md`

Step 7 - Write `chapters/<chapter-id>/chapter-carry-forward.md`

Step 1 - Create `chapters/<chapter-id>/voice-package.md`

Step 2 - Log every speech-tool call in `chapters/<chapter-id>/audio-generation-log.md`

Step 6 - Write `chapters/<chapter-id>/video-package.md`

Step 7 - Write `chapters/<chapter-id>/chapter-carry-forward.md`