| slug | muapi-ai-fight-scene |
| name | muapi-ai-fight-scene |
| version | 1.0.0 |
| description | Generate a high-cut-density action / fight scene by first composing a 16-cell storyboard image, then driving Seedance 2.0 image-to-video off that storyboard. Stacks GPT-Image-2 (character sheet + storyboard), Nano-Banana-2 (environment concept), and Seedance 2.0 i2v. |
| acceptLicenseTerms | true |
AI Fight Scene Generator
Generate a high-cut-density action / fight scene by first composing a 16-cell storyboard image, then driving Seedance 2.0 image-to-video off that storyboard.
The core idea: action tension comes from cut density, not single-shot quality. Forcing the video model to follow a pre-drawn 4×4 storyboard grid gives you 16 distinct shots in a 15-second clip — landing punches, reverse angles, ECUs, whip-pans — that no t2v prompt could choreograph on its own.
Inputs
| Name | Type | Required | Default | Description |
|---|
character_description | text | yes | — | Full physical description of the fighter(s). Asymmetric details (eye colour, scar side, holster on left hip) help the model preserve identity across panels. |
environment_description | text | yes | — | The scene setting — e.g. "cyberpunk wet back-alley, neon kanji signage, Stray-game aesthetic, rain on chrome." |
action_script | text | yes | — | The action beat — prose or numbered beats. E.g. "Hero is cornered → blocks first punch → counter-elbow → throw opponent into trash cans → finisher." |
style_direction | text | no | cinematic action film, anamorphic lens, high contrast, motion blur on hits | Aesthetic / look tags applied to every frame. |
duration | int | no | 15 | Final video length in seconds. The storyboard's 16 cells map roughly 1 shot per second at default. |
aspect_ratio | text | no | 16:9 | Output aspect — 16:9 cinematic, 9:16 vertical, 1:1 square. |
Steps
Phase A — Character Sheet
Generate a clean turnaround-style character sheet using muapi image generate (model=gpt-image-2-text-to-image):
- Prompt:
Character reference sheet of {{character_description}}. Three views — front, 3/4, profile — on a neutral grey backdrop. Studio lighting, full body, no text overlays, photoreal. Asymmetric identifying details preserved on the correct side. {{style_direction}}.
- Aspect ratio:
3:2
Present the character sheet and confirm identity details look right before proceeding. This image becomes reference #1 for later phases.
Phase B — Environment Concept
Use muapi image generate (model=nano-banana-2) to design the scene/world:
- Prompt:
Wide establishing shot of {{environment_description}}. No characters in frame — environment only. Strong perspective lines, depth, atmospheric haze. {{style_direction}}. Production-design concept art.
- Aspect ratio:
{{aspect_ratio}}
Nano-Banana-2 is chosen here for its reasoning-driven composition — it's better than text-to-image-only models at producing locations with believable spatial logic (chokepoints, cover, sightlines) that an action scene can use. Present for approval. This becomes reference #2.
Phase C — 16-Cell Storyboard
Compose the action onto a single 4×4 storyboard image using muapi image edit (model=gpt-image-2-image-to-image):
- Reference Images: the character sheet from Phase A and the environment plate from Phase B.
- Prompt:
Compose a 4×4 storyboard grid (16 numbered cells) for the following action sequence:
{{action_script}}
CHARACTER (use reference image 1 identity throughout, asymmetric details preserved):
{{character_description}}
LOCATION (use reference image 2 spatial layout):
{{environment_description}}
Each cell labels: SHOT # (1–16) · SIZE (WIDE / MS / CU / ECU) · CAMERA-MOVE arrow (push, pull, whip, dolly, crash-zoom, handheld) · 1-word RHYTHM note (BEAT / IMPACT / RECOVERY / RESET).
Vary shot size aggressively — never two WIDEs in a row. Land every IMPACT on a CU or ECU.
Hand-drawn comic-book ink-and-wash style, monochrome with selective red accents on hits.
Numbered cells, clear gutters between panels.
Aesthetic: {{style_direction}}.
- Aspect ratio:
1:1 (square works best for a 4×4 grid)
Present the storyboard to the user. Confirm:
- The 16 shots read clearly
- Identity stays consistent cell-to-cell
- Cut density / shot-size variation looks aggressive enough
If a panel reads poorly, regenerate just the storyboard with that cell's note bolded ("CELL 7 must be an ECU on the right fist").
Phase D — Storyboard → Video (Seedance 2.0)
Hand the storyboard to muapi video from-image (model=seedance-v2.0-i2v):
- Reference Image: the 16-cell storyboard from Phase C.
- Prompt:
Generate a {{duration}}-second action sequence that strictly follows the 16-cell storyboard reference image, cell-by-cell, top-left to bottom-right.
- Honour each cell's labelled SHOT SIZE and CAMERA-MOVE — match cuts to the storyboard's rhythm notes.
- Strong cinematic feel and shot language. Exaggerated dynamics. Hits land hard with motion blur and impact frames.
- Camera language: anamorphic, handheld where the storyboard calls for it, locked-off where it doesn't.
- Native audio: impact sfx on every IMPACT cell, footsteps, fabric/Foley, restrained low score under the action.
Action being rendered: {{action_script}}.
Aesthetic: {{style_direction}}.
- Duration:
{{duration}} (default 15)
- Aspect ratio:
{{aspect_ratio}}
After generation, present the final video. If the cut density feels too low or shots don't match the storyboard, regenerate Phase D first (cheaper than rebuilding the storyboard) with the prompt emphasising "strict cell-by-cell adherence" more aggressively.
Notes
- Why the storyboard image and not a text storyboard? Seedance 2.0 i2v anchors its motion plan to the visual reference. A grid of 16 drawn cells gives it 16 visual targets to hit — text descriptions of shots get averaged into mush.
- Asymmetric character details matter. Without something like "scar over the right eyebrow" or "leather glove on the left hand only", identity drift between cells is the #1 failure mode.
- Use
seedance-2.0-i2v-480p to draft. Cheaper preview pass before committing to the full-res seedance-v2.0-i2v run.
- For longer fights, chain two runs: first run uses storyboard A (cells 1–16, beats 1–15s); second run uses storyboard B (cells 17–32, beats 15–30s) with the last cell of A as a continuity anchor in B's first cell.
- Language: Both English and Chinese prompts work in all four models, so the storyboard cell labels can be in either language.
Trigger Keywords
fight scene, action sequence, storyboard to video, cut density, cinematic action, combat choreography, seedance 2 storyboard
Pipeline at a Glance
character_description ──► [GPT-Image-2 t2i] ─► character sheet ──┐
│
environment_description ─► [Nano-Banana-2 t2i] ─► environment plate ┼─► [GPT-Image-2 i2i] ─► 16-cell storyboard ─► [Seedance 2.0 i2v] ─► 15s action video
│
action_script + style_direction ───────────────────────────────────►┘
Notes for the Executing Agent
- This recipe is LLM-orchestrated: read each phase, gather any missing inputs from the user, then call
muapi CLI commands. Use muapi auth configure first if MUAPI_API_KEY is unset.
- For model IDs without a CLI alias yet, fall back to the raw endpoint via
curl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}' and poll with muapi predict wait <request_id>.
- Phase C uses TWO reference images (character sheet + environment plate). When calling
gpt-image-2-image-to-image, pass them as a list under images_list (or the model's documented multi-ref field).
- Substitute
{{input_name}} placeholders with the user's actual inputs before issuing each call.