| name | tesseract |
| description | You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy. |
| license | Apache-2.0 |
| compatibility | |
| metadata | {"author":"terminal-skills","version":"1.0.0","category":"AI & Machine Learning","tags":["ocr","text-recognition","document-processing","image-to-text","open-source"]} |
Tesseract — Open-Source OCR Engine
You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy.
Core Capabilities
Basic Usage
import pytesseract
from PIL import Image
import cv2
text = pytesseract.image_to_string(Image.open("document.png"))
print(text)
text_de = pytesseract.image_to_string(Image.open("german_doc.png"), lang="deu")
text_multi = pytesseract.image_to_string(Image.open("mixed.png"), lang="eng+fra+deu")
data = pytesseract.image_to_data(Image.open("invoice.png"), output_type=pytesseract.Output.DICT)
for i, word in enumerate(data["text"]):
if word.strip():
x, y, w, h = data["left"][i], data["top"][i], data["width"][i], data["height"][i]
conf = int(data["conf"][i])
print(f"'{word}' at ({x},{y},{w},{h}) confidence: {conf}%")
from pdf2image import convert_from_path
pages = convert_from_path("document.pdf", dpi=300)
full_text = ""
for page in pages:
full_text += pytesseract.image_to_string(page) + "\n\n"
Image Preprocessing for Better Accuracy
import cv2
import numpy as np
def preprocess_for_ocr(image_path: str) -> np.ndarray:
"""Preprocess image for optimal OCR accuracy.
Steps: grayscale → denoise → threshold → deskew → resize
"""
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
denoised = cv2.fastNlMeansDenoising(gray, h=10)
thresh = cv2.adaptiveThreshold(
denoised, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 2,
)
coords = np.column_stack(np.where(thresh > 0))
angle = cv2.minAreaRect(coords)[-1]
if angle < -45:
angle = -(90 + angle)
else:
angle = -angle
(h, w) = thresh.shape
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(thresh, M, (w, h), flags=cv2.INTER_CUBIC,
borderMode=cv2.BORDER_REPLICATE)
if w < 1000:
scale = 2.0
rotated = cv2.resize(rotated, None, fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)
return rotated
processed = preprocess_for_ocr("scan.jpg")
text = pytesseract.image_to_string(processed, config="--psm 6")
Page Segmentation Modes
text = pytesseract.image_to_string(img, config="--psm 4")
text = pytesseract.image_to_string(img, config="--psm 7")
text = pytesseract.image_to_string(img, config="--psm 11")
text = pytesseract.image_to_string(img, config="--psm 7 -c tessedit_char_whitelist=0123456789ABCDEF")
Installation
brew install tesseract
apt install tesseract-ocr
apt install tesseract-ocr-deu tesseract-ocr-fra
pip install pytesseract Pillow
npm install tesseract.js
Best Practices
- Preprocess images — Grayscale → denoise → threshold → deskew before OCR; preprocessing improves accuracy 30-50%
- 300 DPI minimum — Tesseract works best at 300+ DPI; scale up small images before processing
- PSM selection — Choose the right page segmentation mode; PSM 6 for documents, PSM 7 for single lines, PSM 11 for sparse
- Language data — Install language-specific traineddata files; use
lang="eng+deu" for multilingual documents
- Whitelist characters — For known formats (serial numbers, dates), restrict character set for higher accuracy
- Confidence filtering — Use
image_to_data to get per-word confidence; filter out low-confidence results
- Tesseract.js for browser — Use the JavaScript version for client-side OCR; no server needed, runs in Web Workers
- LSTM engine — Tesseract 4+ uses LSTM neural networks by default; much more accurate than Tesseract 3's pattern matching