| name | pdf |
| description | Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill. |
| license | Proprietary. LICENSE.txt has complete terms |
| metadata | {"dependencies":["pypdf","pdfplumber","reportlab","pypdfium2"]} |
PDF Skill — Action Routing
CRITICAL: Decide Your Action FIRST
Before doing anything, classify the user's request and follow the MANDATORY action:
| User wants to... | MANDATORY action |
|---|
| Convert .md to .pdf / Generate PDF from markdown / 把 md 转成 pdf | MUST use bash to run md_to_pdf.py script (see below) |
| Create a new PDF from scratch | Use bash to run a Python script with reportlab |
| Read/extract text from a PDF | Use read_file or bash with pdfplumber |
| Merge/split/rotate/encrypt PDFs | Use bash with pypdf |
| Extract tables from a PDF | Use bash with pdfplumber |
| Fill a PDF form | Read FORMS.md first |
| OCR / read text from scanned PDF | Convert to images → python_repl with vision LLM (see OCR below) |
Markdown → PDF Conversion (MOST COMMON)
ALWAYS use the bash tool to run the built-in script. NEVER use read_file for this task.
bash: python <absolute_path_to_skill>/scripts/md_to_pdf.py <input.md> <output.pdf>
The script path will be listed under "Available Scripts" in the prompt. Use that absolute path directly.
Example:
bash: python /path/to/skills/pdf/scripts/md_to_pdf.py /workspace/report.md /workspace/report.pdf
Features: CJK support, headings, lists, code blocks, tables, bold/italic, blockquotes, horizontal rules.
If reportlab is not installed: bash: pip install reportlab
Reading/Extracting from Existing PDFs
Extract text
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
print(page.extract_text())
Extract tables
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
for table in page.extract_tables():
for row in table:
print(row)
Merge / Split / Rotate
Merge
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for f in ["a.pdf", "b.pdf"]:
for page in PdfReader(f).pages:
writer.add_page(page)
with open("merged.pdf", "wb") as out:
writer.write(out)
Split
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
w = PdfWriter()
w.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as out:
w.write(out)
Rotate
reader = PdfReader("input.pdf")
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90)
writer.add_page(page)
with open("rotated.pdf", "wb") as out:
writer.write(out)
Password Protection
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
writer.encrypt("userpassword", "ownerpassword")
with open("encrypted.pdf", "wb") as out:
writer.write(out)
Create PDF from Scratch (reportlab)
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("output.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = [Paragraph("Title", styles['Title']), Spacer(1, 12), Paragraph("Body text", styles['Normal'])]
doc.build(story)
IMPORTANT: Never use Unicode subscript/superscript characters in ReportLab. Use <sub> and <super> tags instead.
OCR Scanned PDFs (via Vision LLM)
When pdfplumber.extract_text() returns empty or garbled text, the PDF is likely scanned/image-based. Use python_repl with a vision-capable LLM to analyze the converted images:
Step 1: Convert all PDF pages to images:
bash: python <skill_path>/scripts/convert_pdf_to_images.py <input.pdf> <output_dir>
Step 2: Use python_repl to call the vision LLM for OCR on the images:
import base64, os, json
image_dir = "<output_dir>"
images = sorted([f for f in os.listdir(image_dir) if f.lower().endswith(('.png','.jpg','.jpeg'))])
results = []
for img_file in images:
path = os.path.join(image_dir, img_file)
with open(path, "rb") as f:
b64 = base64.b64encode(f.read()).decode()
results.append({"file": img_file, "text": extracted_text})
print(json.dumps(results, ensure_ascii=False, indent=2))
Images are batched into minimal LLM calls automatically (up to 8 per call).
Quick Reference
| Task | Tool | Method |
|---|
| Markdown → PDF | bash + md_to_pdf.py | python scripts/md_to_pdf.py in.md out.pdf |
| Extract text | pdfplumber | page.extract_text() |
| Extract tables | pdfplumber | page.extract_tables() |
| Merge/split/rotate | pypdf | PdfReader + PdfWriter |
| Create from scratch | reportlab | SimpleDocTemplate |
| Fill forms | see FORMS.md | — |
| OCR scanned | python_repl + vision LLM | Convert pages to images → call vision LLM via python_repl |
References
- FORMS.md — PDF form filling
- REFERENCE.md — Advanced features, JS libraries, troubleshooting