| name | zai-vision |
| description | Analyze images and videos using Z.AI GLM-4V vision model. Use when the user needs to analyze screenshots or UI images, extract text from images (OCR), diagnose error screenshots, understand technical diagrams or architecture charts, analyze data visualizations, compare two UI screenshots, or generate code from a UI image. Run scripts/analyze.py via Bash. |
| compatibility | Requires Python 3 and Z_AI_API_KEY environment variable. |
| metadata | {"author":"z-ai","version":"1.0","api-docs":"https://docs.bigmodel.cn/cn/coding-plan/mcp/vision-mcp-server"} |
Z.AI Vision Skill
Analyze images and videos by running scripts/analyze.py with Bash. No MCP configuration needed — set Z_AI_API_KEY and call the script directly.
Prerequisites
export Z_AI_API_KEY=your_api_key
Usage
python3 scripts/analyze.py <image_path_or_url> [--mode <mode>] [--question "custom question"]
Modes
| Mode | --mode flag | When to use |
|---|
| General analysis | analyze (default) | Any image, no specific task |
| OCR / text extraction | ocr | Extract text content from a screenshot |
| UI to code | ui-to-code | Convert UI screenshot to HTML/CSS code |
| Error diagnosis | diagnose | Parse error dialogs and suggest fixes |
| Technical diagram | diagram | Explain architecture/UML/ER diagrams |
| Chart analysis | chart | Extract insights from graphs and dashboards |
| UI diff | diff | Compare two screenshots for differences |
Examples
python3 scripts/analyze.py screenshot.png
python3 scripts/analyze.py screenshot.png --mode ocr
python3 scripts/analyze.py design.png --mode ui-to-code
python3 scripts/analyze.py error.png --mode diagnose
python3 scripts/analyze.py arch.png --mode diagram
python3 scripts/analyze.py chart.png --mode chart
python3 scripts/analyze.py before.png after.png --mode diff
python3 scripts/analyze.py image.png --question "What color is the button?"
python3 scripts/analyze.py https://example.com/image.png --mode ocr
Instructions for the Agent
- Identify which mode fits the user's request using the table above.
- Locate the image file path (or URL) from the user's message or working directory.
- Run the script with Bash using the appropriate
--mode.
- Present the output to the user.
For UI diff, pass two image paths: python3 scripts/analyze.py img1.png img2.png --mode diff