Run any Skill in Manus with one click

pdf-processing-pro

Stars29,226

Forks3,203

UpdatedOctober 17, 2025 at 21:27

Production-ready PDF processing with forms, tables, OCR, validation, and batch operations. Use when working with complex PDF workflows in production environments, processing large volumes of PDFs, or requiring robust error handling and validation.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

davila7

davila7/claude-code-templates

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

File Explorer

5 files

SKILL.md

readonly

More from this repository

same repository

star-history-chart

davila7/claude-code-templates

Add a self-hosted "Stargazers over time" chart to any GitHub repo's README. GitHub now restricts the stargazers endpoint to a repo's own admins/collaborators, so third-party live services (star-history free tier, starchart.cc) return "Requires authentication" for everyone. This generates a static, theme-aware SVG in-repo and auto-refreshes it weekly with a GitHub Action using the repo's own GITHUB_TOKEN. Use when the star chart in a README is broken, shows "Requires authentication", or you want a star history that never breaks.

2026-07-1229.2k

internet-court

davila7/claude-code-templates

The trust layer for agent-to-agent commerce — natural-language mandates, delegated permissions (ERC-7710), x402 payments, escrow, and dispute resolution as one open, catch-all skill. Use for agent mandates, delegated permissions, GenLayer supervision, revocation, x402 payments, escrow, verification, dispute resolution, and agent-to-agent commerce.

2026-07-1129.2k

product-decision-agent

davila7/claude-code-templates

中文产品决策 Agent。用于中国大陆互联网产品、运营、增长、商业化、数据、项目推进和组织协作场景：产品规划、需求分析、PRD、需求优先级、排期、版本规划、Roadmap、MVP、灰度、上线、迭代、增长停滞、拉新、投放、渠道、裂变、CAC、LTV、ROI、留存、转化、DAU/MAU、GMV、漏斗、社区运营、内容供给、创作者、用户运营、活动运营、私域、会员、定价、指标异常、数据口径、埋点、A/B Test、用户反馈、客服/销售反馈、竞品冲击、资源不足、项目延期、需求反复、老板临时插需求、跨部门协作、团队冲突、OKR/KPI、目标拆解、复盘等。触发时像资深互联网产品负责人一样，先判断真实问题、当前阶段、核心阻塞、关键约束、相关方和证据充分性，再给出最值得执行的下一步。默认中文回答，不讲理论、不引用原文、不解释历史、不暴露后台方法来源。

2026-07-1029.2k

owasp-security

davila7/claude-code-templates

Comprehensive OWASP-aligned security guidance across six standards - Top 10 (2021) for web apps, ASVS 5.0, MASVS v2.1.0 for mobile, API Security Top 10 (2023), Kubernetes Top 10 (2022), and the Agentic Applications 2026 edition for AI/LLM. Use for security reviews, vulnerability audits, secure auth/crypto/access-control implementation, Kubernetes manifest hardening, and LLM/agent prompt-injection defense - including indirect requests like "is this login flow secure?", "review this endpoint", or "audit my pod spec".

2026-07-0629.2k

context-architecture

davila7/claude-code-templates

Audit a codebase and bind every claim it makes about itself to a mechanism that fails when the claim stops being true, so it is legible to people and AI agents. Applies Context Architecture's nine principles: make structure say what the system does, place AGENTS.md at boundaries, codify conventions, and bind every claim the repo makes about itself to a mechanism (compiler, linter, automated tests, review) that fails when the claim stops being true. Works greenfield (a repo born legible) and brownfield (a repo restructured in steps). Use when an agent reimplements code that already exists, invents structure, follows stale or deleted docs, propagates a deprecated pattern, or resolves ambiguity at random, or when asked to make a repository "agent-ready", "AI-legible", or to add or fix AGENTS.md / CLAUDE.md files.

2026-07-0529.2k

artifacts-builder

davila7/claude-code-templates

Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.

2026-07-0529.2k

name	PDF Processing Pro
description	Production-ready PDF processing with forms, tables, OCR, validation, and batch operations. Use when working with complex PDF workflows in production environments, processing large volumes of PDFs, or requiring robust error handling and validation.

PDF Processing Pro

Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows.

Quick start

Extract text from PDF

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    text = pdf.pages[0].extract_text()
    print(text)

Analyze PDF form (using included script)

python scripts/analyze_form.py input.pdf --output fields.json
# Returns: JSON with all form fields, types, and positions

Fill PDF form with validation

python scripts/fill_form.py input.pdf data.json output.pdf
# Validates all fields before filling, includes error reporting

Extract tables from PDF

python scripts/extract_tables.py report.pdf --output tables.csv
# Extracts all tables with automatic column detection

Features

✅ Production-ready scripts

All scripts include:

Error handling: Graceful failures with detailed error messages
Validation: Input validation and type checking
Logging: Configurable logging with timestamps
Type hints: Full type annotations for IDE support
CLI interface: --help flag for all scripts
Exit codes: Proper exit codes for automation

✅ Comprehensive workflows

PDF Forms: Complete form processing pipeline
Table Extraction: Advanced table detection and extraction
OCR Processing: Scanned PDF text extraction
Batch Operations: Process multiple PDFs efficiently
Validation: Pre and post-processing validation

Advanced topics

PDF Form Processing

For complete form workflows including:

Field analysis and detection
Dynamic form filling
Validation rules
Multi-page forms
Checkbox and radio button handling

See FORMS.md

Table Extraction

For complex table extraction:

Multi-page tables
Merged cells
Nested tables
Custom table detection
Export to CSV/Excel

See TABLES.md

OCR Processing

For scanned PDFs and image-based documents:

Tesseract integration
Language support
Image preprocessing
Confidence scoring
Batch OCR

See OCR.md

Included scripts

Form processing

analyze_form.py - Extract form field information

python scripts/analyze_form.py input.pdf [--output fields.json] [--verbose]

fill_form.py - Fill PDF forms with data

python scripts/fill_form.py input.pdf data.json output.pdf [--validate]

validate_form.py - Validate form data before filling

python scripts/validate_form.py data.json schema.json

Table extraction

extract_tables.py - Extract tables to CSV/Excel

python scripts/extract_tables.py input.pdf [--output tables.csv] [--format csv|excel]

Text extraction

extract_text.py - Extract text with formatting preservation

python scripts/extract_text.py input.pdf [--output text.txt] [--preserve-formatting]

Utilities

merge_pdfs.py - Merge multiple PDFs

python scripts/merge_pdfs.py file1.pdf file2.pdf file3.pdf --output merged.pdf

split_pdf.py - Split PDF into individual pages

python scripts/split_pdf.py input.pdf --output-dir pages/

validate_pdf.py - Validate PDF integrity

python scripts/validate_pdf.py input.pdf

Common workflows

Workflow 1: Process form submissions

# 1. Analyze form structure
python scripts/analyze_form.py template.pdf --output schema.json

# 2. Validate submission data
python scripts/validate_form.py submission.json schema.json

# 3. Fill form
python scripts/fill_form.py template.pdf submission.json completed.pdf

# 4. Validate output
python scripts/validate_pdf.py completed.pdf

Workflow 2: Extract data from reports

# 1. Extract tables
python scripts/extract_tables.py monthly_report.pdf --output data.csv

# 2. Extract text for analysis
python scripts/extract_text.py monthly_report.pdf --output report.txt

Workflow 3: Batch processing

import glob
from pathlib import Path
import subprocess

# Process all PDFs in directory
for pdf_file in glob.glob("invoices/*.pdf"):
    output_file = Path("processed") / Path(pdf_file).name

    result = subprocess.run([
        "python", "scripts/extract_text.py",
        pdf_file,
        "--output", str(output_file)
    ], capture_output=True)

    if result.returncode == 0:
        print(f"✓ Processed: {pdf_file}")
    else:
        print(f"✗ Failed: {pdf_file} - {result.stderr}")

Error handling

All scripts follow consistent error patterns:

# Exit codes
# 0 - Success
# 1 - File not found
# 2 - Invalid input
# 3 - Processing error
# 4 - Validation error

# Example usage in automation
result = subprocess.run(["python", "scripts/fill_form.py", ...])

if result.returncode == 0:
    print("Success")
elif result.returncode == 4:
    print("Validation failed - check input data")
else:
    print(f"Error occurred: {result.returncode}")

Dependencies

All scripts require:

pip install pdfplumber pypdf pillow pytesseract pandas

Optional for OCR:

# Install tesseract-ocr system package
# macOS: brew install tesseract
# Ubuntu: apt-get install tesseract-ocr
# Windows: Download from GitHub releases

Performance tips

Use batch processing for multiple PDFs
Enable multiprocessing with --parallel flag (where supported)
Cache extracted data to avoid re-processing
Validate inputs early to fail fast
Use streaming for large PDFs (>50MB)

Best practices

Always validate inputs before processing
Use try-except in custom scripts
Log all operations for debugging
Test with sample PDFs before production
Set timeouts for long-running operations
Check exit codes in automation
Backup originals before modification

Troubleshooting

Common issues

"Module not found" errors:

pip install -r requirements.txt

Tesseract not found:

# Install tesseract system package (see Dependencies)

Memory errors with large PDFs:

# Process page by page instead of loading entire PDF
with pdfplumber.open("large.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        # Process page immediately

Permission errors:

chmod +x scripts/*.py

Getting help

All scripts support --help:

python scripts/analyze_form.py --help
python scripts/extract_tables.py --help

For detailed documentation on specific topics, see:

FORMS.md - Complete form processing guide
TABLES.md - Advanced table extraction
OCR.md - Scanned PDF processing