Skip to main content
Run any Skill in Manus
with one click
$pwd:

llava

// Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

$ git log --oneline --stat
stars:164,280
forks:26,957
updated:May 8, 2026 at 21:27
File Explorer
2 files
SKILL.md
readonly