Run any Skill in Manus with one click

tesseract

You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/TerminalSkills/skills --skill tesseract

Copy and paste this command into Claude Code to install the skill

Source

TerminalSkills/skills

Stars62

Forks6

UpdatedMay 13, 2026 at 16:57

File Explorer

2 files

SKILL.md

readonly

name	tesseract
description	You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy.
license	Apache-2.0
compatibility
metadata	{"author":"terminal-skills","version":"1.0.0","category":"AI & Machine Learning","tags":["ocr","text-recognition","document-processing","image-to-text","open-source"]}

Tesseract — Open-Source OCR Engine

Core Capabilities

Basic Usage

# pip install pytesseract Pillow
import pytesseract
from PIL import Image
import cv2

# Simple text extraction
text = pytesseract.image_to_string(Image.open("document.png"))
print(text)

# With language specification
text_de = pytesseract.image_to_string(Image.open("german_doc.png"), lang="deu")

# Multiple languages
text_multi = pytesseract.image_to_string(Image.open("mixed.png"), lang="eng+fra+deu")

# Get bounding boxes for each word
data = pytesseract.image_to_data(Image.open("invoice.png"), output_type=pytesseract.Output.DICT)
for i, word in enumerate(data["text"]):
    if word.strip():
        x, y, w, h = data["left"][i], data["top"][i], data["width"][i], data["height"][i]
        conf = int(data["conf"][i])
        print(f"'{word}' at ({x},{y},{w},{h}) confidence: {conf}%")

# PDF to text
from pdf2image import convert_from_path
pages = convert_from_path("document.pdf", dpi=300)
full_text = ""
for page in pages:
    full_text += pytesseract.image_to_string(page) + "\n\n"

Image Preprocessing for Better Accuracy

import cv2
import numpy as np

def preprocess_for_ocr(image_path: str) -> np.ndarray:
    """Preprocess image for optimal OCR accuracy.

    Steps: grayscale → denoise → threshold → deskew → resize
    """
    img = cv2.imread(image_path)

    # Grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Denoise
    denoised = cv2.fastNlMeansDenoising(gray, h=10)

    # Adaptive threshold (handles uneven lighting)
    thresh = cv2.adaptiveThreshold(
        denoised, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY, 11, 2,
    )

    # Deskew (fix rotated scans)
    coords = np.column_stack(np.where(thresh > 0))
    angle = cv2.minAreaRect(coords)[-1]
    if angle < -45:
        angle = -(90 + angle)
    else:
        angle = -angle

    (h, w) = thresh.shape
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    rotated = cv2.warpAffine(thresh, M, (w, h), flags=cv2.INTER_CUBIC,
                              borderMode=cv2.BORDER_REPLICATE)

    # Scale up if too small (Tesseract works best at 300+ DPI)
    if w < 1000:
        scale = 2.0
        rotated = cv2.resize(rotated, None, fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)

    return rotated

# Use preprocessed image
processed = preprocess_for_ocr("scan.jpg")
text = pytesseract.image_to_string(processed, config="--psm 6")

Page Segmentation Modes

# PSM modes control how Tesseract analyzes page layout
# --psm 0: Orientation and script detection only
# --psm 1: Automatic with OSD
# --psm 3: Fully automatic (default)
# --psm 4: Assume single column
# --psm 6: Assume single uniform block of text
# --psm 7: Treat image as single text line
# --psm 8: Treat image as single word
# --psm 11: Sparse text, no order
# --psm 13: Raw line (no layout analysis)

# For receipts/invoices (structured single column)
text = pytesseract.image_to_string(img, config="--psm 4")

# For single line (serial numbers, license plates)
text = pytesseract.image_to_string(img, config="--psm 7")

# For scattered text (signs in a photo)
text = pytesseract.image_to_string(img, config="--psm 11")

# Whitelist specific characters
text = pytesseract.image_to_string(img, config="--psm 7 -c tessedit_char_whitelist=0123456789ABCDEF")

Installation

# System package
brew install tesseract                     # macOS
apt install tesseract-ocr                 # Ubuntu/Debian
apt install tesseract-ocr-deu tesseract-ocr-fra  # Additional languages

# Python binding
pip install pytesseract Pillow

# Node.js
npm install tesseract.js                  # Pure JS (runs in browser too)

Best Practices

Preprocess images — Grayscale → denoise → threshold → deskew before OCR; preprocessing improves accuracy 30-50%
300 DPI minimum — Tesseract works best at 300+ DPI; scale up small images before processing
PSM selection — Choose the right page segmentation mode; PSM 6 for documents, PSM 7 for single lines, PSM 11 for sparse
Language data — Install language-specific traineddata files; use lang="eng+deu" for multilingual documents
Whitelist characters — For known formats (serial numbers, dates), restrict character set for higher accuracy
Confidence filtering — Use image_to_data to get per-word confidence; filter out low-confidence results
Tesseract.js for browser — Use the JavaScript version for client-side OCR; no server needed, runs in Web Workers
LSTM engine — Tesseract 4+ uses LSTM neural networks by default; much more accurate than Tesseract 3's pattern matching

More from this repository

same repository

sequenzy-email-marketing

TerminalSkills/skills

Operate Sequenzy email marketing from an AI agent. Use when a user asks to manage SaaS email campaigns, subscribers, lists, segments, templates, lifecycle sequences, transactional email, delivery stats, or AI-generated email copy with Sequenzy. Prefer the Sequenzy CLI or API, inspect before mutation, and surface dashboard review URLs for created campaigns or sequences.

2026-05-2162

xquik-twitter

TerminalSkills/skills

Provides Xquik-backed X/Twitter workflows for Hermes Agent through Hermes Tweet. Use when tasks mention searching tweets, reading tweet replies, looking up users, exporting followers, monitoring tweets, posting tweets or replies, sending DMs, Xquik Twitter automation, or approval-gated Twitter actions.

2026-05-2162

3dsmax-rendering

TerminalSkills/skills

Configure and optimize rendering in 3ds Max — V-Ray and Corona render settings, render elements, light mix, batch rendering, network rendering, denoising, and post-production workflows. Use when tasks involve setting up production renders, optimizing render times, batch rendering multiple views, or configuring render farms for archviz and product visualization.

2026-05-1362

3dsmax-scripting

TerminalSkills/skills

Automate 3ds Max with MAXScript and Python — scene manipulation, object creation, material assignment, camera setup, batch operations, UI tools, and file I/O. Use when tasks involve automating repetitive 3ds Max workflows, batch processing scenes, creating custom tools, or scripting scene setup for archviz, product visualization, or VFX.

2026-05-1362

3proxy

TerminalSkills/skills

Deploy and configure 3proxy — a lightweight universal proxy server. Use when a user asks to set up HTTP, HTTPS, SOCKS4, SOCKS5, or transparent proxies, build proxy chains, configure authentication, set bandwidth limits, manage access control lists, set up proxy rotation, create multi-port proxy servers, configure logging and traffic accounting, or deploy a lightweight proxy without heavy VPN overhead. Covers all 3proxy features including proxy chaining, ACLs, traffic shaping, and multi-protocol support.

2026-05-1362

a2a-protocol

TerminalSkills/skills

Builds Agent-to-Agent (A2A) servers and clients following Google's open protocol for agent interoperability. Use when the user wants to create an A2A-compliant agent, build an Agent Card, implement task management, connect agents across frameworks, set up agent discovery, handle streaming responses, implement push notifications, or orchestrate multi-agent workflows. Trigger words: a2a, agent to agent, agent2agent, a2a protocol, a2a server, a2a client, agent card, agent interoperability, agent collaboration, multi-agent, agent discovery, a2a sdk, a2a task.

2026-05-1362

Source

TerminalSkills

TerminalSkills/skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Computer Occupations, All OtherComputer and Mathematical Occupations15-1299L4

name	tesseract
description	You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy.
license	Apache-2.0
compatibility
metadata	{"author":"terminal-skills","version":"1.0.0","category":"AI & Machine Learning","tags":["ocr","text-recognition","document-processing","image-to-text","open-source"]}

Tesseract — Open-Source OCR Engine

Core Capabilities

Basic Usage

# pip install pytesseract Pillow
import pytesseract
from PIL import Image
import cv2

# Simple text extraction
text = pytesseract.image_to_string(Image.open("document.png"))
print(text)

# With language specification
text_de = pytesseract.image_to_string(Image.open("german_doc.png"), lang="deu")

# Multiple languages
text_multi = pytesseract.image_to_string(Image.open("mixed.png"), lang="eng+fra+deu")

# Get bounding boxes for each word
data = pytesseract.image_to_data(Image.open("invoice.png"), output_type=pytesseract.Output.DICT)
for i, word in enumerate(data["text"]):
    if word.strip():
        x, y, w, h = data["left"][i], data["top"][i], data["width"][i], data["height"][i]
        conf = int(data["conf"][i])
        print(f"'{word}' at ({x},{y},{w},{h}) confidence: {conf}%")

# PDF to text
from pdf2image import convert_from_path
pages = convert_from_path("document.pdf", dpi=300)
full_text = ""
for page in pages:
    full_text += pytesseract.image_to_string(page) + "\n\n"

Image Preprocessing for Better Accuracy

import cv2
import numpy as np

def preprocess_for_ocr(image_path: str) -> np.ndarray:
    """Preprocess image for optimal OCR accuracy.

    Steps: grayscale → denoise → threshold → deskew → resize
    """
    img = cv2.imread(image_path)

    # Grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Denoise
    denoised = cv2.fastNlMeansDenoising(gray, h=10)

    # Adaptive threshold (handles uneven lighting)
    thresh = cv2.adaptiveThreshold(
        denoised, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY, 11, 2,
    )

    # Deskew (fix rotated scans)
    coords = np.column_stack(np.where(thresh > 0))
    angle = cv2.minAreaRect(coords)[-1]
    if angle < -45:
        angle = -(90 + angle)
    else:
        angle = -angle

    (h, w) = thresh.shape
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    rotated = cv2.warpAffine(thresh, M, (w, h), flags=cv2.INTER_CUBIC,
                              borderMode=cv2.BORDER_REPLICATE)

    # Scale up if too small (Tesseract works best at 300+ DPI)
    if w < 1000:
        scale = 2.0
        rotated = cv2.resize(rotated, None, fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)

    return rotated

# Use preprocessed image
processed = preprocess_for_ocr("scan.jpg")
text = pytesseract.image_to_string(processed, config="--psm 6")

Page Segmentation Modes

# PSM modes control how Tesseract analyzes page layout
# --psm 0: Orientation and script detection only
# --psm 1: Automatic with OSD
# --psm 3: Fully automatic (default)
# --psm 4: Assume single column
# --psm 6: Assume single uniform block of text
# --psm 7: Treat image as single text line
# --psm 8: Treat image as single word
# --psm 11: Sparse text, no order
# --psm 13: Raw line (no layout analysis)

# For receipts/invoices (structured single column)
text = pytesseract.image_to_string(img, config="--psm 4")

# For single line (serial numbers, license plates)
text = pytesseract.image_to_string(img, config="--psm 7")

# For scattered text (signs in a photo)
text = pytesseract.image_to_string(img, config="--psm 11")

# Whitelist specific characters
text = pytesseract.image_to_string(img, config="--psm 7 -c tessedit_char_whitelist=0123456789ABCDEF")

Installation

# System package
brew install tesseract                     # macOS
apt install tesseract-ocr                 # Ubuntu/Debian
apt install tesseract-ocr-deu tesseract-ocr-fra  # Additional languages

# Python binding
pip install pytesseract Pillow

# Node.js
npm install tesseract.js                  # Pure JS (runs in browser too)

Best Practices

Preprocess images — Grayscale → denoise → threshold → deskew before OCR; preprocessing improves accuracy 30-50%
300 DPI minimum — Tesseract works best at 300+ DPI; scale up small images before processing
PSM selection — Choose the right page segmentation mode; PSM 6 for documents, PSM 7 for single lines, PSM 11 for sparse
Language data — Install language-specific traineddata files; use lang="eng+deu" for multilingual documents
Whitelist characters — For known formats (serial numbers, dates), restrict character set for higher accuracy
Confidence filtering — Use image_to_data to get per-word confidence; filter out low-confidence results
Tesseract.js for browser — Use the JavaScript version for client-side OCR; no server needed, runs in Web Workers
LSTM engine — Tesseract 4+ uses LSTM neural networks by default; much more accurate than Tesseract 3's pattern matching