with one click
image-ocr
// Extract text from images using Python OCR. Use when the user wants to read text from screenshots, photos of documents, scanned pages, or any image containing text. Supports PNG, JPEG, TIFF, BMP, and WebP formats.
// Extract text from images using Python OCR. Use when the user wants to read text from screenshots, photos of documents, scanned pages, or any image containing text. Supports PNG, JPEG, TIFF, BMP, and WebP formats.
Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When GLM needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks
Process, transform, analyze, and report on CSV and JSON data files. Use when the user needs to filter rows, join datasets, compute aggregates, convert formats, deduplicate, or generate summary reports from tabular data. Works with any CSV, TSV, or JSON Lines file.
Generate professional email drafts using Python templates. Use when the user needs to compose business emails, follow-ups, introductions, meeting requests, or other professional correspondence. Supports multiple tones, languages, and email types with structured output.
Read, write, analyze, and format Excel spreadsheets (.xlsx). Use when the user needs to create Excel files, extract data from spreadsheets, apply formulas, format cells, or generate Excel reports from data. Supports multiple sheets, charts, and conditional formatting.
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When GLM needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Scrape web pages, search the internet, and extract structured content using Python. Use when the user wants to fetch a webpage, search for information online, extract links, or crawl JavaScript-rendered dynamic pages.
| name | image-ocr |
| description | Extract text from images using Python OCR. Use when the user wants to read text from screenshots, photos of documents, scanned pages, or any image containing text. Supports PNG, JPEG, TIFF, BMP, and WebP formats. |
| compatibility | Requires Python 3 and pytesseract + Pillow (pip install pytesseract Pillow). Also requires Tesseract OCR engine installed on the system. |
Extract text from images using Tesseract OCR via Python.
| Script | Purpose | Dependencies |
|---|---|---|
ocr_extract.py | Extract text from images with multiple options | pytesseract, Pillow |
Install the Python packages:
pip install pytesseract Pillow
Install Tesseract OCR engine:
brew install tesseractsudo apt install tesseract-ocrsudo dnf install tesseractFor additional language support:
sudo apt install tesseract-ocr-chi-sim (Chinese Simplified), tesseract-ocr-jpn (Japanese), etc.CRITICAL — Dependency Error Recovery: If the script fails with an
ImportErroror "tesseract not found" error, install the missing dependencies using the commands above, then re-run the EXACT SAME script command that failed.
python scripts/ocr_extract.py "IMAGE_PATH"
Options:
--lang LANG — OCR language (default: eng). Use chi_sim for Chinese, jpn for Japanese, eng+chi_sim for multiple.--save OUTPUT_PATH — Save extracted text to a file--preprocess MODE — Image preprocessing: none (default), grayscale, threshold, blur--dpi DPI — Set image DPI for better accuracy (default: auto-detect)--psm MODE — Tesseract page segmentation mode (0-13, default: 3 = auto)Examples:
# Basic text extraction
python scripts/ocr_extract.py "screenshot.png"
# Chinese text extraction
python scripts/ocr_extract.py "document.jpg" --lang chi_sim
# Mixed English and Chinese
python scripts/ocr_extract.py "mixed.png" --lang eng+chi_sim
# Preprocess noisy image for better accuracy
python scripts/ocr_extract.py "noisy_scan.png" --preprocess threshold
# Save output to file
python scripts/ocr_extract.py "scan.tiff" --save output.txt
# Single line of text (e.g., license plate, serial number)
python scripts/ocr_extract.py "plate.jpg" --psm 7
| Mode | Description | Use Case |
|---|---|---|
| 3 | Fully automatic (default) | General documents |
| 4 | Assume single column | Single-column text |
| 6 | Assume single block | Uniform text block |
| 7 | Single line | One line of text |
| 8 | Single word | One word |
| 11 | Sparse text | Text scattered on image |
| 13 | Raw line | Single line, no OSD |
--preprocess threshold or --preprocess blur to improve results--dpi 300 or higher+, e.g., --lang eng+chi_sim+jpn