Skip to main content
Exécutez n'importe quel Skill dans Manus
en un clic

llava

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

Étoiles189 253
Forks32 680
Mis à jour8 mai 2026 à 21:27
Explorateur de fichiers
2 fichiers
SKILL.md
readonly