تشغيل أي مهارة في Manus بنقرة واحدة

ابدأ الآن

ocr-document

النجوم٥٢٦

التفرعات١٩

آخر تحديث١٣ يونيو ٢٠٢٦ في ١٤:٢١

Extract text from PDFs, images, and scanned documents. Uses pymupdf (local) or optional cloud OCR APIs.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

fuyuxiang

fuyuxiang/echo-agent

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

موظفو المكاتب العاموندعم المكاتب والإدارة·SOC 43-9061

مستكشف الملفات

2 ملفات

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

ppt-author

fuyuxiang/echo-agent

Create and edit PowerPoint (.pptx) presentations programmatically. Requires python-pptx.

2026-06-22526

excel-author

fuyuxiang/echo-agent

Create and edit Excel (.xlsx) workbooks with openpyxl. Supports formulas, charts, formatting, and data analysis.

2026-06-13526

image-gen

fuyuxiang/echo-agent

Generate images via DALL-E, Stable Diffusion, or free alternatives. Supports multi-channel delivery.

2026-06-13526

meme-gen

fuyuxiang/echo-agent

Generate meme images with text overlays using Pillow. Pick templates or create custom image macros.

2026-06-13526

code-runner

fuyuxiang/echo-agent

Execute Python code snippets in a sandboxed environment. Supports data analysis, visualization, and quick scripts.

2026-06-13526

github-ops

fuyuxiang/echo-agent

GitHub CLI for issues, PRs, code search, CI logs, releases, and API queries. Requires gh CLI and auth.

2026-06-13526

name	ocr-document
description	Extract text from PDFs, images, and scanned documents. Uses pymupdf (local) or optional cloud OCR APIs.
version	1.0.0
metadata	{"echo":{"tags":["OCR","PDF","Document","Extract","Text"]}}

OCR & Document Processing

Extract text from PDFs, scanned images, and documents.

PDF Text Extraction (PyMuPDF)

Best choice for text-based PDFs:

pip install pymupdf

import pymupdf

doc = pymupdf.open("file.pdf")
for page in doc:
    text = page.get_text()
    print(text)

# All pages at once
full_text = "\n".join(page.get_text() for page in doc)

PDF → Markdown (marker-pdf)

High-quality conversion preserving structure:

pip install marker-pdf
marker_single file.pdf output_dir/ --output_format markdown

Image OCR

Surya OCR (Modern ML-based, best for Chinese)

pip install surya-ocr
surya_ocr image.png --langs zh,en

Pytesseract (Traditional, widely available)

# Install Tesseract engine first
brew install tesseract tesseract-lang  # macOS
apt install tesseract-ocr tesseract-ocr-chi-sim  # Linux
pip install pytesseract Pillow

import pytesseract
from PIL import Image

text = pytesseract.image_to_string(
    Image.open("scan.png"),
    lang="chi_sim+eng"
)

Script

python3 scripts/extract_document.py document.pdf
python3 scripts/extract_document.py scan.png
python3 scripts/extract_document.py report.pdf --output extracted.txt

Auto-detects format by extension: PDF → pymupdf, DOCX → python-docx, Image → pytesseract. OCR language is controlled by system Tesseract config (e.g., chi_sim+eng default).

Tips

For scanned PDFs, extract images first then OCR each page
Preprocessing (deskew, contrast) improves OCR accuracy
Chinese OCR: surya-ocr > pytesseract for accuracy