Skip to main content
Manusで任意のスキルを実行
ワンクリックで
$pwd:

ag2-multimodal-input

// Send images, audio, video, or documents into an AG2 beta `Agent` alongside text. Pass `ImageInput`, `AudioInput`, `VideoInput`, or `DocumentInput` as positional args to `agent.ask(...)`. Use when the user wants the agent to process non-text input — describe a photo, transcribe audio, summarise a PDF, analyse a video. Covers per-provider support matrix, the four ways to source data (URL / path / bytes / file_id), Gemini-specific YouTube + media-resolution + clipping, OpenAI image-detail, Anthropic prompt-caching on attachments, and `FilesAPI` for upload lifecycle.

$ git log --oneline --stat
stars:240
forks:75
updated:2026年4月30日 23:37
SKILL.md
readonly