| name | minimax-multimodal-toolkit |
| description | MiniMax-native multimodal workflow for image, video, voice, music, and media-processing tasks. Use when the user asks to generate image/video/audio assets, wants MiniMax-specific media APIs, needs TTS or voice workflows, wants reproducible local media outputs, or needs FFmpeg-style processing around generated media.
|
| license | MIT |
| metadata | {"version":"1.0.0","category":"media-generation","sources":["MiniMax platform media capabilities","Current runtime tool surface","FFmpeg documentation"]} |
MiniMax Multimodal Toolkit
Use MiniMax-native media workflows without bloating the always-on prompt. Route the task to the smallest path that can honestly produce the requested artifact.
When to Use
- The user asks for image, video, voice, speech, music, or multimodal asset generation
- The user explicitly mentions MiniMax media capabilities or wants MiniMax API integration
- The user wants reproducible local media outputs rather than only in-chat prose
- The task involves media conversion, trimming, concatenation, or extraction around generated assets
For deeper routing notes, output conventions, and implementation details, also read reference.md in this skill directory.
Step 0: Determine the Real Goal
Classify the task before acting:
- Direct asset generation: user wants an image, clip, narration, or music artifact
- Product integration: user wants app code that calls MiniMax media APIs
- Media pipeline work: user already has files and needs processing, conversion, or stitching
- Capability research: user wants comparison, planning, or API guidance before building
Do not jump into API integration when a direct generation path is enough.
Step 1: Route to the Right Path
| User need | Primary path | Notes |
|---|
| One-off image asset | Use the runtime's direct image-generation tool if available | Fastest path for explicit image requests |
| Video, TTS, voice, music, or MiniMax-specific generation | Use current MiniMax docs and the repo/runtime tool surface | Check auth and output path first |
| Existing media needs editing | Use local tooling such as FFmpeg when available | Avoid re-generation unless needed |
| App feature using MiniMax media APIs | Implement integration code and verify with a focused request or fixture | Prefer smallest vertical slice |
| Planning or research only | Gather current docs and synthesize | Do not implement prematurely |
Step 2: Inspect Before Generating
Before any implementation or generation:
- Inspect the repo for existing media patterns, asset folders, env handling, and helper utilities
- Check the current runtime for direct generation tools before inventing scripts
- Check whether required MiniMax credentials or host configuration already exist
- Clarify only if the missing answer changes the route:
- output medium
- target format
- duration or size constraints
- whether the user wants direct generation or product integration
Core Rules
- Prefer the smallest path that produces the requested artifact honestly
- Use direct generation tools for explicit image requests when available
- Use MiniMax-specific API flows when the user asks for MiniMax integration, reproducibility, video, TTS, voice, or music
- Keep generated outputs in a predictable project-local folder rather than scattering temp files
- Never hardcode secrets; use environment variables and document the missing configuration
- Do not claim a generated asset exists until you have verified the file or response
- For integration work, verify one focused happy-path request before broadening the feature
Verification Expectations
Match proof to the task:
- Asset generation: verify the output file exists or the tool returned a concrete artifact
- API integration: verify one focused request, script, or runtime flow
- Media processing: verify the output file was created and matches the requested format or duration
- UI integration: verify at the user surface, not only by build success
If the artifact was designed but not generated, report it as changed and unverified, not complete.
Workflow
1. CLASSIFY -> direct asset, integration, processing, or research
2. ROUTE -> choose direct tool, MiniMax API path, or local media tooling
3. INSPECT -> repo patterns, runtime surface, env/auth, output constraints
4. EXECUTE -> make the smallest honest slice
5. VERIFY -> prove the artifact or integration at the relevant surface
Quick Reference
IMAGE -> direct image tool first when available
VIDEO/TTS -> MiniMax-specific workflow or integration path
MUSIC -> MiniMax-specific workflow or integration path
PROCESSING -> local media tooling, usually FFmpeg
INTEGRATE -> smallest API slice + focused verification
ALWAYS -> inspect runtime first, use env vars for secrets, verify outputs
NEVER -> hardcode keys, promise files that were not produced, skip surface proof