Skip to main content
Run any Skill in Manus
with one click

multimodal-llm

Vision, audio, video generation, and multimodal LLM integration patterns. Use when processing images, transcribing audio, generating speech, generating AI video (Kling v3, Sora 2, Veo 3.1 std/lite/fast, Runway Gen-4.5 via `gen4_turbo`), or building multimodal AI pipelines.

Stars189
Forks15
UpdatedJune 13, 2026 at 20:40
File Explorer
13 files
SKILL.md
readonly