con un clic
ai-multimodal
Image/vision analysis, generation prompt crafting, and multimodal AI workflow orchestration
Menú
Image/vision analysis, generation prompt crafting, and multimodal AI workflow orchestration
Unified design foundations — design system architecture, tokens, component specs, visual principles, creative vision, figma integration, plus brand design system loader (66 real brands via DESIGN.md). Absorbs design, design-system, design-systems, design-principles, design-router, creative-vision, figma, design-md.
Render, summarize, and present markdown documents and structured content in multiple output modes
Ultra UI skill - combines Google's DESIGN.md spec (machine-readable design tokens) with the ui-ux-pro-max knowledge base (91 styles, 161 palettes, 73 font pairings, 161 products, 104 UX guidelines, 25 chart types). Generates lint-clean DESIGN.md files, validates token references and WCAG contrast, exports Tailwind/DTCG tokens, and diffs design systems version-over-version.
Initialize UltraThink capabilities in the current project directory
Advanced UI/UX design intelligence — InuVerse edition. 84 styles, 160 palettes, 73 font pairings, 161 product types, 98 UX guidelines, 25 chart types across 16 stacks. Absorbs ui, ui-design, ui-styling, ui-ux-pro, ui-design-pipeline, stitch, web-design-guidelines. Routes all UI structure/visual/interaction work.
Get Shit Done — spec-driven development pipeline with XML-structured plans, wave-based parallel execution with fresh-context agents, goal-backward verification, and lightweight ad-hoc quick mode with composable flags. Single skill containing plan/execute/verify/quick modes as sub-workflows.
| name | ai-multimodal |
| description | Image/vision analysis, generation prompt crafting, and multimodal AI workflow orchestration |
| layer | utility |
| category | ai-tools |
| triggers | ["analyze this image","describe this screenshot","generate an image","vision analysis","image prompt","multimodal","read this diagram"] |
| inputs | [{"image_path":"Path to an image file for analysis"},{"analysis_goal":"What to extract or understand from the image"},{"generation_prompt":"Description of image to generate"},{"style":"Visual style preferences for generation"}] |
| outputs | [{"analysis":"Structured description of image contents"},{"extracted_data":"Specific data pulled from the image (text, UI elements, diagrams)"},{"generation_prompt":"Optimized prompt for image generation APIs"},{"recommendations":"Suggestions based on visual analysis"}] |
| linksTo | ["ui-ux-pro","media-processing","chrome-devtools"] |
| linkedFrom | ["orchestrator","planner"] |
| preferredNextSkills | ["ui-ux-pro","media-processing"] |
| fallbackSkills | ["chrome-devtools"] |
| riskLevel | low |
| memoryReadPolicy | selective |
| memoryWritePolicy | selective |
| sideEffects | [] |
This skill handles all interactions involving visual content — analyzing screenshots, interpreting diagrams, extracting information from images, crafting image generation prompts, and orchestrating multimodal AI workflows. It bridges the gap between visual and textual reasoning.
| Mode | Use Case | Output |
|---|---|---|
| Descriptive | "What is in this image?" | Detailed natural language description |
| Extractive | "Read the text/data from this image" | Structured data extraction |
| Diagnostic | "What's wrong with this UI?" | Issue identification with recommendations |
| Comparative | "How do these two designs differ?" | Structured comparison |
| Interpretive | "What does this diagram mean?" | Semantic interpretation of visual information |
When analyzing any image, systematically assess:
LAYER 1 — COMPOSITION:
- Type: screenshot / photo / diagram / chart / illustration / icon
- Dimensions: aspect ratio and resolution implications
- Layout: grid / freeform / hierarchical / sequential
LAYER 2 — CONTENT:
- Primary subject(s): What dominates the image
- Text content: Any readable text (OCR-level extraction)
- Data content: Numbers, charts, graphs, tables
- UI elements: Buttons, forms, navigation, cards (if screenshot)
LAYER 3 — CONTEXT:
- Purpose: What this image is trying to communicate
- Audience: Who this is designed for
- Quality: Resolution, clarity, artifacts, compression
LAYER 4 — SEMANTICS:
- Meaning: What information does this convey beyond literal content
- Relationships: How elements relate to each other
- Flow: What sequence or hierarchy is implied
INPUT: Screenshot of a UI
STEP 1: Identify the application type
- Web app / mobile app / desktop app / CLI
- Platform: browser, iOS, Android, desktop OS
- Framework hints: React DevTools icon, specific component patterns
STEP 2: Catalog UI elements
- Navigation: header, sidebar, tabs, breadcrumbs
- Content: cards, lists, tables, forms
- Actions: buttons, links, toggles, dropdowns
- Feedback: alerts, toasts, loading states, empty states
STEP 3: Assess design quality
- Spacing: consistent padding and margins
- Typography: hierarchy, readability, font choices
- Color: contrast ratios, palette consistency, accessibility
- Alignment: grid adherence, visual balance
- Depth: shadows, elevation, layering
STEP 4: Identify issues
- Accessibility: contrast failures, missing labels, touch target sizes
- Usability: unclear CTAs, information overload, hidden actions
- Consistency: style deviations, mixed patterns
- Responsiveness: overflow, truncation, broken layouts
OUTPUT FORMAT:
SUMMARY: [1-2 sentence overview]
POSITIVE: [What works well]
ISSUES:
- [SEVERITY] [Issue description] → [Recommendation]
ACCESSIBILITY:
- [WCAG criterion] [Pass/Fail] [Details]
INPUT: Architecture diagram, flowchart, ER diagram, or sequence diagram
STEP 1: Identify diagram type
- Flowchart: process flow with decisions
- Sequence: temporal message passing between actors
- ER: entity relationships with cardinality
- Architecture: system components and connections
- Class: object-oriented structure
- State: state machine with transitions
STEP 2: Extract elements
- Nodes/entities: name, type, attributes
- Connections: direction, labels, cardinality
- Groupings: boundaries, clusters, swimlanes
- Annotations: notes, constraints, legends
STEP 3: Interpret semantics
- Data flow: where does data originate and terminate
- Control flow: what drives decisions and transitions
- Dependencies: what depends on what
- Bottlenecks: single points of failure, high-fan-in nodes
STEP 4: Generate machine-readable representation
- Convert to Mermaid syntax (hand off to mermaid skill)
- Or convert to structured text description
INPUT: Description of desired image
STEP 1: Structure the prompt
SUBJECT: [What is the main subject]
ACTION: [What is the subject doing]
SETTING: [Where / background]
STYLE: [Art style, medium, aesthetic]
MOOD: [Emotional tone, lighting, atmosphere]
TECHNICAL: [Aspect ratio, quality, camera angle]
STEP 2: Apply prompt engineering principles
- Front-load important elements (subject first)
- Use specific, concrete descriptors over vague ones
- Include negative prompts for unwanted elements
- Specify style references when possible
- Use weight/emphasis syntax for the target platform
STEP 3: Optimize for the target model
DALL-E 3:
- Natural language descriptions work best
- Be descriptive and narrative
- Specify "digital art", "photograph", "illustration" etc.
Stable Diffusion:
- Comma-separated tags work best
- Include quality boosters: "masterpiece, best quality, highly detailed"
- Use negative prompts extensively
- Specify model-specific tags (e.g., "8k uhd, dslr")
Midjourney:
- Use /imagine with concise, evocative language
- Append parameters: --ar 16:9 --v 6 --q 2
- Reference artists or styles with "in the style of"
- Use --no for negative prompts
INPUT: Image containing structured data (chart, table, form)
STEP 1: Identify data type
- Table: rows and columns of data
- Chart: bar, line, pie, scatter, etc.
- Form: filled form fields
- Document: structured text document
- Receipt/Invoice: financial data
STEP 2: Extract raw data
For tables:
| Header 1 | Header 2 | Header 3 |
|----------|----------|----------|
| value | value | value |
For charts:
CHART TYPE: [bar/line/pie/etc.]
X-AXIS: [label and unit]
Y-AXIS: [label and unit]
DATA POINTS: [extracted values]
TREND: [observed pattern]
For forms:
FIELD: [label] = [value]
STEP 3: Validate extraction
- Cross-check totals if available
- Verify units and scales
- Flag uncertain readings with confidence levels
- Note any obscured or illegible portions
OUTPUT FORMAT:
FORMAT: [table/csv/json — most appropriate for the data]
DATA: [Extracted data in chosen format]
CONFIDENCE: [high/medium/low for each data point]
NOTES: [Anything uncertain or partially readable]
Analyze this UI screenshot and provide:
1. A brief description of what the screen shows
2. UI element inventory (navigation, content areas, actions)
3. Design assessment (spacing, typography, color, alignment)
4. Accessibility issues (contrast, labels, touch targets)
5. Top 3 improvement recommendations with specific CSS/design fixes
Interpret this architecture diagram:
1. List all system components and their roles
2. Map all connections with direction and purpose
3. Identify the data flow from user request to response
4. Note any potential single points of failure
5. Convert to Mermaid syntax for version control
Analyze this error screenshot:
1. Read and transcribe the exact error message
2. Identify the error type (runtime, build, network, UI)
3. Identify the source (file path, line number if visible)
4. Suggest probable causes based on the error
5. Provide fix steps in order of likelihood