ワンクリックで
video-generation
// Guide to video generation in MassGen. Use when creating videos from text prompts or images across Grok, Google Veo, and OpenAI Sora backends.
// Guide to video generation in MassGen. Use when creating videos from text prompts or images across Grok, Google Veo, and OpenAI Sora backends.
Run MassGen experiments and analyze logs using automation mode, logfire tracing, and SQL queries. Use this skill for performance analysis, debugging agent behavior, evaluating coordination patterns, and improving the logging structure, or whenever an ANALYSIS_REPORT.md is needed in a log directory.
Invoke MassGen's multi-agent system. Use when the user wants multiple AI agents on a task: writing, code, review, planning, specs, research, design, or any task where parallel iteration beats working alone.
Complete guide for integrating a new LLM backend into MassGen. Use when adding a new provider (e.g., Codex, Mistral, DeepSeek) or when auditing an existing backend for missing integration points. Covers all ~15 files that need touching.
Guide to image generation and editing in MassGen. Use when creating images, editing existing images, iterating on image designs, or choosing between image backends (OpenAI, Google Gemini/Imagen, Grok, OpenRouter).
Reference guide for adding new media generation backends to MassGen's unified generate_media tool.
Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.
| name | video-generation |
| description | Guide to video generation in MassGen. Use when creating videos from text prompts or images across Grok, Google Veo, and OpenAI Sora backends. |
Generate videos using generate_media with mode="video". The system auto-selects the best backend based on available API keys.
# Simple text-to-video (auto-selects backend)
generate_media(prompt="A robot walking through a city", mode="video")
# Specify backend and duration
generate_media(prompt="Ocean waves crashing on rocks", mode="video",
backend_type="google", duration=8)
# With aspect ratio
generate_media(prompt="A timelapse of clouds", mode="video",
backend_type="grok", aspect_ratio="16:9", duration=10)
| Backend | Default Model | Duration Range | Default Duration | Resolutions | API Key |
|---|---|---|---|---|---|
| Grok (priority 1) | grok-imagine-video | 1-15s | 5s | 480p, 720p | XAI_API_KEY |
| Google Veo (priority 2) | veo-3.1-generate-preview | 4-8s | 8s | 720p, 1080p, 4K (use size); default 16:9 | GOOGLE_API_KEY |
| OpenAI Sora (priority 3) | sora-2 | 4, 8, or 12s (discrete) | 4s | Standard | OPENAI_API_KEY |
| Parameter | Description | Example |
|---|---|---|
prompt | Text description of the video | "A drone flying over mountains" |
backend_type | Force a specific backend | "grok", "google", "openai" |
model | Override default model | "veo-3.1-generate-preview" |
duration | Video length in seconds | 8 (clamped to backend limits) |
aspect_ratio | Video aspect ratio | "16:9", "9:16", "1:1" |
size | Resolution (Grok: 480p/720p; Veo: 720p/1080p/4k) | "720p", "1080p", "4k" |
input_images | Source image for image-to-video | ["starting_frame.jpg"] |
video_reference_images | Style/content guide images (Veo, up to 3) | ["ref1.png", "ref2.png"] |
negative_prompt | What to exclude (Veo) | "blurry, low quality" |
Each backend has different duration constraints. generate_media automatically clamps the requested duration:
A warning is logged if duration is adjusted.
All three video backends support starting video from an existing image via input_images:
generate_media(
prompt="Animate this scene with gentle movement",
mode="video",
input_images=["scene.jpg"],
duration=5
)
The first image in input_images is used; additional images are ignored.
Video generation is significantly slower than images. All backends use polling:
Veo 3.1 generates audio (dialogue, SFX, ambient) automatically from prompt content. No extra parameter needed — just describe the sounds:
"Hello," she said.)tires screeching, engine roaring)eerie hum resonates through the hallway)When extending videos via continue_from with a veo_vid_* ID:
Current APIs cap at 15 seconds max per clip (Grok), with most backends at 4-8s. There is no way to generate a continuous 30+ second video in one call. The proven approach:
background=TrueFor visual continuity, use the same style anchor in every prompt (e.g., "BBC Earth documentary cinematography") and maintain consistent lighting/color descriptions.
Full production guide with examples, transition types, and duration strategy: See references/production.md
The best results come from combining AI-generated footage with Remotion's programmatic animation — not choosing one or the other.
AI video generation produces photorealistic, cinematic footage that pure programmatic rendering cannot match. Remotion produces precise typography, motion graphics, overlays, and transitions that AI generation cannot reliably control. Use both together.
<Video> or <OffthreadVideo> background layersEvery AI-generated clip costs real money and time. Do not abandon generated footage and fall back to purely programmatic rendering. This is a common failure mode — agents generate clips, notice minor artifacts (e.g., repeated patterns, slight distortion), then pivot entirely to OpenCV/PIL/moviepy rendering, wasting all the generation budget.
Instead:
| Situation | Wrong Approach | Right Approach |
|---|---|---|
| Minor artifacts in generated clip | Discard clip, render from scratch with OpenCV | Use clip as background, mask artifacts with overlays/motion graphics |
| Generated clip doesn't match vision exactly | Regenerate or abandon | Composite typography/effects on top to guide the viewer's attention |
| Need precise text/logo placement | Skip AI generation, use pure programmatic | Generate atmospheric footage, overlay text in Remotion |
| Some shots need AI footage, others don't | Use one approach for everything | Mix: AI-backed shots + pure Remotion animation shots |
Each generate_media(mode="video") call is expensive. Plan before generating:
read_media, then plan your Remotion composition around actual footageRemotion is the default post-production tool for any video that needs editing beyond simple concatenation. This includes captions, titles, transitions, overlays, motion graphics — essentially any video intended to look professional. Do not use raw ffmpeg drawtext or manual filter chains for these tasks; the results look amateur compared to what Remotion produces.
When you have video clips to assemble, load the Remotion skill and use it. This is not optional for professional output.
Load the skill to get detailed rules and code examples:
.agent/skills/remotion/SKILL.md| Capability | Remotion | Raw ffmpeg |
|---|---|---|
| Styled animated captions | CSS-styled, word-level highlighting, animations | drawtext — ugly, painful escaping |
| Title cards / lower thirds | React components, any font/layout | Manual positioning, limited fonts |
| Scene transitions | Timing curves, spring animations, custom effects | Basic xfade (fade, wipe) |
| Motion graphics | Full React/CSS/Three.js/Lottie ecosystem | Not possible |
| Light leak / overlay effects | Built-in @remotion/light-leaks | Complex filter chains |
| Text animations | Typography effects, per-character animation | Not feasible |
| AI footage + overlays | Import clips as <Video>, layer React components on top | Not feasible at quality |
Only use ffmpeg without Remotion for:
lut3d filter)generate_media (parallel, background mode) — for shots that need cinematic/photorealistic qualityread_media — assess what you have, plan composition around actual footagegenerate_media(mode="audio")<Video> background layers, overlay typography/motion graphics/captions, add pure-animation segments for title cards and transitionsWhen working on a specific task, load the relevant rule files from the Remotion skill:
rules/subtitles.md, rules/display-captions.md, rules/transcribe-captions.mdrules/transitions.mdrules/text-animations.mdrules/light-leaks.mdrules/audio.md, rules/audio-visualization.mdrules/sequencing.md, rules/trimming.mdrules/3d.mdrules/animations.md, rules/timing.md