Skip to main content
Ejecuta cualquier Skill en Manus
con un clic

llava

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

Estrellas193.356
Forks33.795
Actualizado8 de mayo de 2026, 21:27
Explorador de archivos
2 archivos
SKILL.md
readonly