Official skill for generating high-quality images from text prompts using ZhiPu GLM-Image API. Excellent at scientific illustrations, high-quality portraits, social media graphics, and commercial posters. Supports multiple aspect ratios, HD quality, and watermark control. Use this skill when the user wants to generate images, create AI art, text-to-image, or convert text descriptions into visual content.
Documentation-only master skill for GLM ecosystem discovery and installation. This skill does not execute scripts or subprocess commands. It provides a curated list of official GLM skills, install methods, and source links.
Official skill for recognizing and extracting mathematical formulas from images and PDFs into LaTeX format using ZhiPu GLM-OCR API. Supports complex equations, inline formulas, and formula blocks. Use this skill when the user wants to extract formulas, convert formula images to LaTeX, or OCR mathematical expressions.
Official skill for recognizing handwritten text from images using ZhiPu GLM-OCR API. Supports various handwriting styles, languages, and mixed handwritten/printed content. Use this skill when the user wants to read handwritten notes, convert handwriting to text, or OCR handwritten documents.
Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析", (3) User has a document (screenshot, scanned page, invoice, paper, whiteboard photo) and needs its content in structured form, (4) User asks to parse, digitize, or extract content from a visual document. Invokes the GLM-OCR SDK (pip install glmocr) to parse documents via Zhipu's cloud API. No GPU required. Returns structured JSON (regions with labels + bounding boxes) and Markdown. Agent can operate entirely via CLI — no YAML files needed. NOT for: real-time camera feeds, audio transcription, or non-document images (photos, illustrations).
Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recognition. Use this skill whenever the user wants to extract text from images, perform OCR on pictures, scan documents, convert images to text, or process any image files to get their textual content.
Official skill for recognizing and extracting tables from images and PDFs into Markdown format using ZhiPu GLM-OCR API. Supports complex tables, merged cells, and multi-page documents. Use this skill when the user wants to extract tables, recognize spreadsheets, or convert table images to editable format.
Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).