Run any Skill in Manus with one click

$pwd:

pdf-to-md

Name: Pdf To Md
Author: pedrohcgs

// Convert a PDF (or any document) to clean markdown so Claude can read it efficiently. Tries Microsoft MarkItDown first; falls back to OpenDataLoader-PDF for complex layouts (multi-column papers, dense tables, math). Use when user says "convert this PDF", "make this document readable", "turn paper.pdf into markdown", "preprocess this document", or before reading any PDF for analysis.

Run Skill in Manus

$ git log --oneline --stat

stars:5

forks:3

updated:May 1, 2026 at 20:10

SKILL.md

readonly

name	pdf-to-md
description	Convert a PDF (or any document) to clean markdown so Claude can read it efficiently. Tries Microsoft MarkItDown first; falls back to OpenDataLoader-PDF for complex layouts (multi-column papers, dense tables, math). Use when user says "convert this PDF", "make this document readable", "turn paper.pdf into markdown", "preprocess this document", or before reading any PDF for analysis.
argument-hint	[input-file] [optional: output-file]
allowed-tools	["Bash","Read","Write"]

/pdf-to-md — Convert any document to clean markdown

Markdown is Claude's native format. PDFs cost 2–3× more tokens and bury content under formatting artifacts. Converting first cuts token usage 50–70% with no quality loss, and improves comprehension on multi-column or table-heavy PDFs.

Inputs

$1 — input file path. Supported: .pdf, .docx, .pptx, .xlsx, .html, .epub, .csv, .xml.
$2 — output path (optional). If omitted, writes alongside the input with the .md extension.

Steps

1. Validate input

Check that $1 exists and identify the format from the extension.

test -f "$1" || { echo "File not found: $1" && exit 1; }
EXT="${1##*.}"
OUT="${2:-${1%.*}.md}"

2. Quick first pass with MarkItDown

markitdown handles every supported format in one call. Run it first.

markitdown "$1" > "$OUT"

If the input is not a PDF, jump to step 4 (verify). MarkItDown's output is sufficient for DOCX / PPTX / HTML / EPUB.

3. PDF triage — decide whether to fall back to OpenDataLoader-PDF

PDFs are the only format where layout matters. Inspect the MarkItDown output for failure signals:

Output is suspiciously short for a multi-page input (e.g., < 50 words for a 5-page PDF) — extraction failed.
Tables collapsed to single lines — column structure was lost.
Lines from different columns interleaved — multi-column reading order broken.
Math/equations rendered as □□□ or random characters — no semantic capture.

If any trigger fires, rerun with OpenDataLoader-PDF (PDF specialist with XY-cut++ reading order):

opendataloader "$1" -o "$OUT" --format markdown

Otherwise, the MarkItDown pass is fine — proceed to step 4.

4. Verify

Read the first 30 lines of $OUT. Confirm:

Headers preserved as # / ## / ###
Tables visible as markdown pipe tables (where applicable)
No raw XML, PDF metadata, or <embed> tags leaked through
Body text reads as continuous prose, not fragmented column-by-column

If verification fails on a PDF that already used OpenDataLoader-PDF, escalate to the user — the PDF may be scanned-only and need OCR (Tesseract, Cloud Vision) before any markdown conversion will work.

5. Report

Tell the user:

Output path ($OUT)
Word count (run wc -w "$OUT")
Which tool produced the final output and why (note the trigger if you fell back)
Any sections that looked suspect

Install (one-time)

pip install markitdown
# OpenDataLoader-PDF — see https://github.com/opendataloader-project/opendataloader-pdf for install instructions

When NOT to use this skill

Source is already .md / .qmd / .tex — don't bother converting.
You need bounding-box source citations (RAG with click-to-source) — call OpenDataLoader-PDF directly with its full feature set; this skill simplifies the output.
The PDF is scanned-only with no embedded text layer — neither tool helps. Run OCR first (Tesseract, Cloud Vision, etc.).

Why this skill exists

A repeated pattern: user drops a PDF into master_supporting_docs/, asks Claude to read it, and the response uses 3× the tokens of the same content as markdown — sometimes with garbled multi-column reading. This skill closes that loop. Convert once at intake; feed Claude the clean version forever.

References

microsoft/markitdown — Microsoft's multi-format converter.
opendataloader-project/opendataloader-pdf — PDF specialist with structured output.
Claude API PDF support — for the cases where you genuinely want PDF-as-input (e.g., signed contracts where layout matters).

related-skills.json

same repository

audit-reproducibility.md

from "pedrohcgs/Claude-Mini"

Enforce the replication-protocol.md rule by cross-checking numeric claims in a manuscript against the actual R / Stata / Python outputs. Report PASS/FAIL per claim against tolerance thresholds. Use before submission and before releasing a replication package.

2026-05-015

qa-quarto.md

from "pedrohcgs/Claude-Mini"

Adversarial QA workflow comparing Quarto HTML against Beamer PDF benchmark. Iterates between critic (finds issues) and fixer (applies fixes) until APPROVED or max iterations reached.

2026-05-015

slide-excellence.md

from "pedrohcgs/Claude-Mini"

Comprehensive slide excellence review combining visual audit, pedagogical review, and proofreading. Produces three reports and a combined summary.

2026-05-015

pedagogy-review.md

from "pedrohcgs/Claude-Mini"

Run holistic pedagogical review on lecture slides. Checks narrative arc, PhD student prerequisites, worked examples, notation clarity, and deck pacing.

2026-04-295

visual-audit.md

from "pedrohcgs/Claude-Mini"

Perform adversarial visual audit of Quarto or Beamer slides checking for overflow, font consistency, box fatigue, and layout issues.

2026-04-295

package.json

"author": "pedrohcgs"

"repository": "pedrohcgs/Claude-Mini"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	pdf-to-md
description	Convert a PDF (or any document) to clean markdown so Claude can read it efficiently. Tries Microsoft MarkItDown first; falls back to OpenDataLoader-PDF for complex layouts (multi-column papers, dense tables, math). Use when user says "convert this PDF", "make this document readable", "turn paper.pdf into markdown", "preprocess this document", or before reading any PDF for analysis.
argument-hint	[input-file] [optional: output-file]
allowed-tools	["Bash","Read","Write"]

/pdf-to-md — Convert any document to clean markdown

Inputs

$1 — input file path. Supported: .pdf, .docx, .pptx, .xlsx, .html, .epub, .csv, .xml.
$2 — output path (optional). If omitted, writes alongside the input with the .md extension.

Steps

1. Validate input

Check that $1 exists and identify the format from the extension.

test -f "$1" || { echo "File not found: $1" && exit 1; }
EXT="${1##*.}"
OUT="${2:-${1%.*}.md}"

2. Quick first pass with MarkItDown

markitdown handles every supported format in one call. Run it first.

markitdown "$1" > "$OUT"

If the input is not a PDF, jump to step 4 (verify). MarkItDown's output is sufficient for DOCX / PPTX / HTML / EPUB.

3. PDF triage — decide whether to fall back to OpenDataLoader-PDF

PDFs are the only format where layout matters. Inspect the MarkItDown output for failure signals:

Output is suspiciously short for a multi-page input (e.g., < 50 words for a 5-page PDF) — extraction failed.
Tables collapsed to single lines — column structure was lost.
Lines from different columns interleaved — multi-column reading order broken.
Math/equations rendered as □□□ or random characters — no semantic capture.

If any trigger fires, rerun with OpenDataLoader-PDF (PDF specialist with XY-cut++ reading order):

opendataloader "$1" -o "$OUT" --format markdown

Otherwise, the MarkItDown pass is fine — proceed to step 4.

4. Verify

Read the first 30 lines of $OUT. Confirm:

Headers preserved as # / ## / ###
Tables visible as markdown pipe tables (where applicable)
No raw XML, PDF metadata, or <embed> tags leaked through
Body text reads as continuous prose, not fragmented column-by-column

5. Report

Tell the user:

Output path ($OUT)
Word count (run wc -w "$OUT")
Which tool produced the final output and why (note the trigger if you fell back)
Any sections that looked suspect

Install (one-time)

pip install markitdown
# OpenDataLoader-PDF — see https://github.com/opendataloader-project/opendataloader-pdf for install instructions

When NOT to use this skill

Source is already .md / .qmd / .tex — don't bother converting.
You need bounding-box source citations (RAG with click-to-source) — call OpenDataLoader-PDF directly with its full feature set; this skill simplifies the output.
The PDF is scanned-only with no embedded text layer — neither tool helps. Run OCR first (Tesseract, Cloud Vision, etc.).

Why this skill exists

References

microsoft/markitdown — Microsoft's multi-format converter.
opendataloader-project/opendataloader-pdf — PDF specialist with structured output.
Claude API PDF support — for the cases where you genuinely want PDF-as-input (e.g., signed contracts where layout matters).

pdf-to-md

/pdf-to-md — Convert any document to clean markdown

Inputs

Steps

1. Validate input

2. Quick first pass with MarkItDown

3. PDF triage — decide whether to fall back to OpenDataLoader-PDF

4. Verify

5. Report

Install (one-time)

When NOT to use this skill

Why this skill exists

References

More from this repository

More from this repository

/pdf-to-md — Convert any document to clean markdown

Inputs

Steps

1. Validate input

2. Quick first pass with MarkItDown

3. PDF triage — decide whether to fall back to OpenDataLoader-PDF

4. Verify

5. Report

Install (one-time)

When NOT to use this skill

Why this skill exists

References