Skip to main content
在 Manus 中运行任何 Skill
一键导入

llava

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

星标189,253
分支32,680
更新时间2026年5月8日 21:27
文件资源管理器
2 个文件
SKILL.md
readonly