Run any Skill in Manus with one click

$pwd:

image-ocr

Name: Image Ocr
Author: AIDotNet

// Extract text from images using Python OCR. Use when the user wants to read text from screenshots, photos of documents, scanned pages, or any image containing text. Supports PNG, JPEG, TIFF, BMP, and WebP formats.

Run Skill in Manus

$ git log --oneline --stat

stars:462

forks:91

updated:March 20, 2026 at 09:48

File Explorer

2 files

SKILL.md

readonly

name	image-ocr
description	Extract text from images using Python OCR. Use when the user wants to read text from screenshots, photos of documents, scanned pages, or any image containing text. Supports PNG, JPEG, TIFF, BMP, and WebP formats.
compatibility	Requires Python 3 and pytesseract + Pillow (pip install pytesseract Pillow). Also requires Tesseract OCR engine installed on the system.

Image OCR

Extract text from images using Tesseract OCR via Python.

When to use this skill

User asks to read or extract text from an image
User has a screenshot with text they want to process
User has scanned documents that need text extraction
User wants to digitize text from photos

Scripts overview

Script	Purpose	Dependencies
`ocr_extract.py`	Extract text from images with multiple options	`pytesseract`, `Pillow`

Steps

1. Install dependencies (first time only)

Install the Python packages:

pip install pytesseract Pillow

Install Tesseract OCR engine:

Windows: Download installer from https://github.com/UB-Mannheim/tesseract/wiki
macOS: brew install tesseract
Linux (Ubuntu/Debian): sudo apt install tesseract-ocr
Linux (Fedora): sudo dnf install tesseract

For additional language support:

Windows: Select languages during installation
Linux: sudo apt install tesseract-ocr-chi-sim (Chinese Simplified), tesseract-ocr-jpn (Japanese), etc.

CRITICAL — Dependency Error Recovery: If the script fails with an ImportError or "tesseract not found" error, install the missing dependencies using the commands above, then re-run the EXACT SAME script command that failed.

2. Extract text from an image

python scripts/ocr_extract.py "IMAGE_PATH"

Options:

--lang LANG — OCR language (default: eng). Use chi_sim for Chinese, jpn for Japanese, eng+chi_sim for multiple.
--save OUTPUT_PATH — Save extracted text to a file
--preprocess MODE — Image preprocessing: none (default), grayscale, threshold, blur
--dpi DPI — Set image DPI for better accuracy (default: auto-detect)
--psm MODE — Tesseract page segmentation mode (0-13, default: 3 = auto)

Examples:

# Basic text extraction
python scripts/ocr_extract.py "screenshot.png"

# Chinese text extraction
python scripts/ocr_extract.py "document.jpg" --lang chi_sim

# Mixed English and Chinese
python scripts/ocr_extract.py "mixed.png" --lang eng+chi_sim

# Preprocess noisy image for better accuracy
python scripts/ocr_extract.py "noisy_scan.png" --preprocess threshold

# Save output to file
python scripts/ocr_extract.py "scan.tiff" --save output.txt

# Single line of text (e.g., license plate, serial number)
python scripts/ocr_extract.py "plate.jpg" --psm 7

Page Segmentation Modes (PSM)

Mode	Description	Use Case
3	Fully automatic (default)	General documents
4	Assume single column	Single-column text
6	Assume single block	Uniform text block
7	Single line	One line of text
8	Single word	One word
11	Sparse text	Text scattered on image
13	Raw line	Single line, no OSD

Edge cases

Low quality images: Use --preprocess threshold or --preprocess blur to improve results
Rotated text: Tesseract handles slight rotation; for heavily rotated images, rotate first
Very small text: Increase DPI with --dpi 300 or higher
Mixed languages: Combine with +, e.g., --lang eng+chi_sim+jpn
Empty results: Try different PSM modes or preprocessing options

Scripts

ocr_extract.py — Extract text from images using Tesseract OCR

related-skills.json

same repository

docx.md

from "AIDotNet/OpenCowork"

Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When GLM needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks

2026-05-12462

csv-pipeline.md

from "AIDotNet/OpenCowork"

Process, transform, analyze, and report on CSV and JSON data files. Use when the user needs to filter rows, join datasets, compute aggregates, convert formats, deduplicate, or generate summary reports from tabular data. Works with any CSV, TSV, or JSON Lines file.

2026-03-20462

email-drafter.md

from "AIDotNet/OpenCowork"

Generate professional email drafts using Python templates. Use when the user needs to compose business emails, follow-ups, introductions, meeting requests, or other professional correspondence. Supports multiple tones, languages, and email types with structured output.

2026-03-20462

excel-processor.md

from "AIDotNet/OpenCowork"

Read, write, analyze, and format Excel spreadsheets (.xlsx). Use when the user needs to create Excel files, extract data from spreadsheets, apply formulas, format cells, or generate Excel reports from data. Supports multiple sheets, charts, and conditional formatting.

2026-03-20462

pdf.md

from "AIDotNet/OpenCowork"

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When GLM needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

2026-03-20462

web-scraper.md

from "AIDotNet/OpenCowork"

Scrape web pages, search the internet, and extract structured content using Python. Use when the user wants to fetch a webpage, search for information online, extract links, or crawl JavaScript-rendered dynamic pages.

2026-03-20462

package.json

"author": "AIDotNet"

"repository": "AIDotNet/OpenCowork"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Computer Occupations, All OtherComputer and Mathematical Occupations15-1299L4

name	image-ocr
description	Extract text from images using Python OCR. Use when the user wants to read text from screenshots, photos of documents, scanned pages, or any image containing text. Supports PNG, JPEG, TIFF, BMP, and WebP formats.
compatibility	Requires Python 3 and pytesseract + Pillow (pip install pytesseract Pillow). Also requires Tesseract OCR engine installed on the system.

Image OCR

Extract text from images using Tesseract OCR via Python.

When to use this skill

User asks to read or extract text from an image
User has a screenshot with text they want to process
User has scanned documents that need text extraction
User wants to digitize text from photos

Scripts overview

Script	Purpose	Dependencies
`ocr_extract.py`	Extract text from images with multiple options	`pytesseract`, `Pillow`

Steps

1. Install dependencies (first time only)

Install the Python packages:

pip install pytesseract Pillow

Install Tesseract OCR engine:

Windows: Download installer from https://github.com/UB-Mannheim/tesseract/wiki
macOS: brew install tesseract
Linux (Ubuntu/Debian): sudo apt install tesseract-ocr
Linux (Fedora): sudo dnf install tesseract

For additional language support:

Windows: Select languages during installation
Linux: sudo apt install tesseract-ocr-chi-sim (Chinese Simplified), tesseract-ocr-jpn (Japanese), etc.

CRITICAL — Dependency Error Recovery: If the script fails with an ImportError or "tesseract not found" error, install the missing dependencies using the commands above, then re-run the EXACT SAME script command that failed.

2. Extract text from an image

python scripts/ocr_extract.py "IMAGE_PATH"

Options:

--lang LANG — OCR language (default: eng). Use chi_sim for Chinese, jpn for Japanese, eng+chi_sim for multiple.
--save OUTPUT_PATH — Save extracted text to a file
--preprocess MODE — Image preprocessing: none (default), grayscale, threshold, blur
--dpi DPI — Set image DPI for better accuracy (default: auto-detect)
--psm MODE — Tesseract page segmentation mode (0-13, default: 3 = auto)

Examples:

# Basic text extraction
python scripts/ocr_extract.py "screenshot.png"

# Chinese text extraction
python scripts/ocr_extract.py "document.jpg" --lang chi_sim

# Mixed English and Chinese
python scripts/ocr_extract.py "mixed.png" --lang eng+chi_sim

# Preprocess noisy image for better accuracy
python scripts/ocr_extract.py "noisy_scan.png" --preprocess threshold

# Save output to file
python scripts/ocr_extract.py "scan.tiff" --save output.txt

# Single line of text (e.g., license plate, serial number)
python scripts/ocr_extract.py "plate.jpg" --psm 7

Page Segmentation Modes (PSM)

Mode	Description	Use Case
3	Fully automatic (default)	General documents
4	Assume single column	Single-column text
6	Assume single block	Uniform text block
7	Single line	One line of text
8	Single word	One word
11	Sparse text	Text scattered on image
13	Raw line	Single line, no OSD

Edge cases

Low quality images: Use --preprocess threshold or --preprocess blur to improve results
Rotated text: Tesseract handles slight rotation; for heavily rotated images, rotate first
Very small text: Increase DPI with --dpi 300 or higher
Mixed languages: Combine with +, e.g., --lang eng+chi_sim+jpn
Empty results: Try different PSM modes or preprocessing options

Scripts

ocr_extract.py — Extract text from images using Tesseract OCR

image-ocr

Image OCR

When to use this skill

Scripts overview

Steps

1. Install dependencies (first time only)

2. Extract text from an image

Page Segmentation Modes (PSM)

Edge cases

Scripts

More from this repository

More from this repository

Image OCR

When to use this skill

Scripts overview

Steps

1. Install dependencies (first time only)

2. Extract text from an image

Page Segmentation Modes (PSM)

Edge cases

Scripts