Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

glmocr-sdk

Name: Glmocr Sdk
Author: zai-org

// Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析", (3) User has a document (screenshot, scanned page, invoice, paper, whiteboard photo) and needs its content in structured form, (4) User asks to parse, digitize, or extract content from a visual document. Invokes the GLM-OCR SDK (pip install glmocr) to parse documents via Zhipu's cloud API. No GPU required. Returns structured JSON (regions with labels + bounding boxes) and Markdown. Agent can operate entirely via CLI — no YAML files needed. NOT for: real-time camera feeds, audio transcription, or non-document images (photos, illustrations).

Exécuter dans Manus

$ git log --oneline --stat

stars:398

forks:31

updated:15 avril 2026 à 12:28

SKILL.md

readonly

related-skills.json

même dépôt

glm-image-gen.md

from "zai-org/GLM-skills"

Official skill for generating high-quality images from text prompts using ZhiPu GLM-Image API. Excellent at scientific illustrations, high-quality portraits, social media graphics, and commercial posters. Supports multiple aspect ratios, HD quality, and watermark control. Use this skill when the user wants to generate images, create AI art, text-to-image, or convert text descriptions into visual content.

2026-04-15398

glm-master-skill.md

from "zai-org/GLM-skills"

Documentation-only master skill for GLM ecosystem discovery and installation. This skill does not execute scripts or subprocess commands. It provides a curated list of official GLM skills, install methods, and source links.

2026-04-15398

glmocr-formula.md

from "zai-org/GLM-skills"

Official skill for recognizing and extracting mathematical formulas from images and PDFs into LaTeX format using ZhiPu GLM-OCR API. Supports complex equations, inline formulas, and formula blocks. Use this skill when the user wants to extract formulas, convert formula images to LaTeX, or OCR mathematical expressions.

2026-04-15398

glmocr-handwriting.md

from "zai-org/GLM-skills"

Official skill for recognizing handwritten text from images using ZhiPu GLM-OCR API. Supports various handwriting styles, languages, and mixed handwritten/printed content. Use this skill when the user wants to read handwritten notes, convert handwriting to text, or OCR handwritten documents.

2026-04-15398

glmocr.md

from "zai-org/GLM-skills"

Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recognition. Use this skill whenever the user wants to extract text from images, perform OCR on pictures, scan documents, convert images to text, or process any image files to get their textual content.

2026-04-15398

glmocr-table.md

from "zai-org/GLM-skills"

Official skill for recognizing and extracting tables from images and PDFs into Markdown format using ZhiPu GLM-OCR API. Supports complex tables, merged cells, and multi-page documents. Use this skill when the user wants to extract tables, recognize spreadsheets, or convert table images to editable format.

2026-04-15398

package.json

"author": "zai-org"

"repository": "zai-org/GLM-skills"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name	glmocr-sdk
description	Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析", (3) User has a document (screenshot, scanned page, invoice, paper, whiteboard photo) and needs its content in structured form, (4) User asks to parse, digitize, or extract content from a visual document. Invokes the GLM-OCR SDK (pip install glmocr) to parse documents via Zhipu's cloud API. No GPU required. Returns structured JSON (regions with labels + bounding boxes) and Markdown. Agent can operate entirely via CLI — no YAML files needed. NOT for: real-time camera feeds, audio transcription, or non-document images (photos, illustrations).
metadata	{"openclaw":{"requires":{"env":["ZHIPU_API_KEY"]},"primaryEnv":"ZHIPU_API_KEY","emoji":"📄","homepage":"https://github.com/zai-org/GLM-OCR/tree/main/skills/sdk"}}

OpenClaw Skill: glmocr

Parses documents (images, PDFs, scans) via the GLM-OCR SDK.

📌 On-demand: This skill requires only ZHIPU_API_KEY in the environment. No YAML config files or GPU needed.

⚡ Quick Start

# Install
pip install glmocr

# Set API key (once)
export ZHIPU_API_KEY=sk-xxx
# or add to .env file in working directory:
echo "ZHIPU_API_KEY=sk-xxx" >> .env

# One-liner
import glmocr
result = glmocr.parse("document.pdf")
print(result.markdown_result)
print(result.to_dict())

# CLI — pass API key directly (no env setup needed)
glmocr parse image.png --api-key sk-xxx

# Or load from a specific .env file
glmocr parse image.png --env-file /path/to/.env

# Or rely on env var / auto-discovered .env (set once, then omit)
glmocr parse image.png
glmocr parse ./scans/ --output ./output/ --stdout

Configuration Priority

Constructor kwargs  >  os.environ  >  .env file  >  config.yaml  >  built-in defaults

Agents override everything via constructor kwargs or env vars — no YAML editing needed.

Key Environment Variables

Variable	Description	Example
`ZHIPU_API_KEY`	API key (required for MaaS)	`sk-abc123`
`GLMOCR_MODEL`	Model name	`glm-ocr`
`GLMOCR_TIMEOUT`	Request timeout (seconds)	`600`
`GLMOCR_ENABLE_LAYOUT`	Layout detection on/off	`true`
`GLMOCR_LOG_LEVEL`	`DEBUG` / `INFO` / `WARNING` / `ERROR`	`INFO`

Python API

Convenience function (single call)

import glmocr

# Single file → PipelineResult
result = glmocr.parse("invoice.png")

# Multiple files → list[PipelineResult]
results = glmocr.parse(["page1.png", "page2.png", "report.pdf"])

Class-based (multiple calls / resource reuse)

from glmocr import GlmOcr

parser = GlmOcr(api_key="sk-xxx")   # mode auto-set to "maas"
parser = GlmOcr(mode="maas")        # reads ZHIPU_API_KEY from env

# Always use as context manager or call .close()
with GlmOcr(api_key="sk-xxx") as parser:
    result = parser.parse("document.png")
    print(result.markdown_result)

parser.close()   # if not using `with`

Constructor Parameters

Parameter	Type	Description
`api_key`	`str`	API key. Providing this auto-enables MaaS mode.
`api_url`	`str`	Override MaaS endpoint URL
`model`	`str`	Model name override
`timeout`	`int`	Request timeout in seconds (default: 600)
`enable_layout`	`bool`	Enable layout detection
`log_level`	`str`	Logging level

Working with `PipelineResult`

Fields

result.markdown_result    # str — full document as Markdown
result.json_result        # list[list[dict]] — structured regions per page
result.original_images    # list[str] — absolute paths of input images

`json_result` structure

List of pages → list of regions per page:

[
  [
    {
      "index": 0,
      "label": "title",
      "content": "Annual Report 2024",
      "bbox_2d": [100, 50, 900, 120]
    },
    {
      "index": 1,
      "label": "table",
      "content": "| Q1 | Q2 |\n|---|---|\n| 120 | 145 |",
      "bbox_2d": [100, 140, 900, 400]
    }
  ]
]

Bounding boxes (bbox_2d): [x1, y1, x2, y2] normalised to 0–1000 scale.

Region labels: title, text, table, figure, formula, header, footer, page_number, reference, seal

Serialization

# Dict (JSON-serializable, for passing to other tools)
d = result.to_dict()
# Keys: json_result, markdown_result, original_images, usage (MaaS), data_info (MaaS)

# JSON string
json_str = result.to_json()                 # pretty-printed, ensure_ascii=False
json_str = result.to_json(indent=None)      # compact single line

# Save to disk: writes <stem>/<stem>.json + <stem>/<stem>.md + layout_vis/
result.save(output_dir="./output")
result.save(output_dir="./output", save_layout_visualization=False)

Error Handling

The SDK does not raise on MaaS errors — check to_dict() for an "error" key:

result = parser.parse("image.png")
d = result.to_dict()
if "error" in d:
    # Handle failure
    print("OCR failed:", d["error"])
else:
    print(d["markdown_result"])

CLI Reference

Agent-preferred interface: use the CLI for most operations. Set ZHIPU_API_KEY in env once, then invoke as needed.

Supported input formats: .jpg, .jpeg, .png, .bmp, .gif, .webp, .pdf

Basic usage

# Parse a single file → saves to ./output/<stem>/
# MaaS mode is the default; ZHIPU_API_KEY must be set (or use --api-key)
glmocr parse image.png

# Pass API key directly without any env setup
glmocr parse image.png --api-key sk-xxx

# Parse a directory → saves each file to ./output/<stem>/
glmocr parse ./scans/

# Use self-hosted vLLM/SGLang instead of cloud
glmocr parse image.png --mode selfhosted

# Specify output directory
glmocr parse image.png --output ./results/

Read results in the terminal (agent-friendly)

# Print Markdown + JSON to stdout (and still save to disk)
glmocr parse image.png --stdout

# Print to stdout ONLY — do not write any files
glmocr parse image.png --stdout --no-save

# JSON only (no Markdown output)
glmocr parse image.png --stdout --json-only

# Pipe JSON into jq for structured extraction
glmocr parse image.png --stdout --json-only --no-save | jq '.[0] | map(select(.label=="table"))'

Save control

# Skip layout visualization images (faster, smaller output)
glmocr parse image.png --no-layout-vis

# Parse and save only JSON + Markdown, skip layout vis
glmocr parse image.png --no-layout-vis --output ./results/

Batch processing

# All images in a folder
glmocr parse ./invoice_scans/ --output ./parsed/ --no-layout-vis

# With progress visible in logs
glmocr parse ./docs/ --output ./parsed/ --log-level INFO

Debugging

glmocr parse image.png --log-level DEBUG

Full flag reference

Flag	Default	Description
`--api-key / -k`	env var	API key for MaaS mode (overrides `ZHIPU_API_KEY`)
`--mode`	`maas`	`maas` (cloud, default) or `selfhosted` (local GPU)
`--env-file`	auto	Path to `.env` file (default: auto-discover from cwd)
`--output / -o`	`./output`	Output directory
`--stdout`	off	Print JSON + Markdown to stdout
`--no-save`	off	Skip writing files (use with `--stdout`)
`--json-only`	off	stdout JSON only, no Markdown
`--no-layout-vis`	off	Skip layout visualization images
`--config / -c`	none	Path to YAML config override
`--log-level`	`INFO`	`DEBUG` / `INFO` / `WARNING` / `ERROR`

Typical Agent Workflow

receive document path / URL
       │
       ▼
glmocr.parse(path)            ← single call, handles PDF/image
       │
       ▼
result.to_dict()              ← safe to pass as tool output
       │
       ├── markdown_result    → hand to LLM for reading / summarization
       └── json_result        → structured extraction (tables, formulas, regions by label)

Filter by label

result = glmocr.parse("report.png")
regions = result.json_result[0]  # first page

tables = [r for r in regions if r["label"] == "table"]
formulas = [r for r in regions if r["label"] == "formula"]
body_text = [r for r in regions if r["label"] == "text"]

Multi-page PDF → iterate pages

with GlmOcr(api_key="sk-xxx") as parser:
    result = parser.parse("document.pdf")   # all pages in one PipelineResult
    for page_idx, page_regions in enumerate(result.json_result):
        print(f"Page {page_idx + 1}: {len(page_regions)} regions")
        for region in page_regions:
            print(f"  [{region['label']}] {region['content'][:60]}")

Programmatic config (no env vars)

from glmocr.config import GlmOcrConfig

cfg = GlmOcrConfig.from_env(
    api_key="sk-xxx",
    mode="maas",
    timeout=600,
    log_level="DEBUG",
)

Output Directory Layout

After result.save(output_dir):

output_dir/
  <image_stem>/
    <image_stem>.json         ← structured regions
    <image_stem>.md           ← full Markdown (with cropped figure images)
    imgs/                     ← cropped figures referenced in Markdown
    layout_vis/               ← layout detection overlay images (if enabled)
      <image_stem>.jpg

Common Pitfalls

ZHIPU_API_KEY not set: SDK defaults to MaaS mode. Without a key, parse() will fail with a clear error message and quick-fix instructions. Set via export ZHIPU_API_KEY=sk-xxx, add to a .env file, or pass --api-key sk-xxx to the CLI.
Large PDFs: Default timeout is 600s. For very long documents increase with timeout=1200.
result.json_result is a string: Happens when the model returns malformed JSON. The SDK preserves the raw string — parse or log it manually.

glmocr-sdk

Plus depuis ce dépôt

Plus depuis ce dépôt

OpenClaw Skill: glmocr

⚡ Quick Start

Configuration Priority

Key Environment Variables

Python API

Convenience function (single call)

Class-based (multiple calls / resource reuse)

Constructor Parameters

Working with PipelineResult

Fields

json_result structure

Serialization

Error Handling

CLI Reference

Basic usage

Read results in the terminal (agent-friendly)

Save control

Batch processing

Debugging

Full flag reference

Typical Agent Workflow

Filter by label

Multi-page PDF → iterate pages

Programmatic config (no env vars)

Output Directory Layout

Common Pitfalls

OpenClaw Skill: glmocr

⚡ Quick Start

Configuration Priority

Key Environment Variables

Python API

Convenience function (single call)

Class-based (multiple calls / resource reuse)

Constructor Parameters

Working with PipelineResult

Fields

json_result structure

Serialization

Error Handling

CLI Reference

Basic usage

Read results in the terminal (agent-friendly)

Save control

Batch processing

Debugging

Full flag reference

Typical Agent Workflow

Filter by label

Multi-page PDF → iterate pages

Programmatic config (no env vars)

Output Directory Layout

Common Pitfalls

Working with `PipelineResult`

`json_result` structure

Working with `PipelineResult`

`json_result` structure