Run any Skill in Manus with one click

pdf-to-html

Converts a PDF into one self-contained, readable HTML file that preserves images, tables, charts and reading order — optionally translating it into another language while keeping every figure. Uses structured extraction (PyMuPDF), font-size-driven layout, compressed base64-inlined images (a single portable file), and mandatory headless-Chrome visual verification. Use whenever someone wants to READ a PDF as a web page or clean document, turn a PDF into HTML, or translate a PDF into another language while keeping its images/tables/charts intact — e.g. "PDF 转 HTML", "把这个 PDF 转成中文网页版", "make this report readable", "translate this PDF but don't lose the charts", "I just want to read this PDF on my phone". Distinct from doc-to-markdown (plain Markdown text) and pdf-creator (Markdown→PDF) — this one produces a styled, image-faithful HTML reading experience.

Run Skill in Manus

Stars1,169

Forks179

UpdatedJune 6, 2026 at 17:31

Source

daymade

daymade/claude-code-skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

File Explorer

6 files

SKILL.md

readonly

name

pdf-to-html

description

PDF to HTML

Turn a PDF into a single, self-contained, readable HTML file — images, tables, charts and reading order preserved — and optionally translate it, keeping every figure in place.

The pipeline is extract → look → (translate) → build → verify. The middle "look" and final "verify" steps are where faithfulness actually comes from: a PDF is a layout, not just a text stream, so you read the rendered pages before building and the rendered HTML before delivering.

This skill runs inline (no context: fork): translation orchestrates a Dynamic Workflow, and a subagent cannot spawn one.

When to use / not use

Use when the goal is to read a PDF as HTML/web page, to convert a PDF to a styled HTML document, or to translate a PDF into another language while keeping its figures and tables.
doc-to-markdown instead if they want plain Markdown text (no styling, figures optional).
pdf-creator instead for the reverse direction (Markdown → PDF).

What it does NOT do

Scanned/image-only PDFs (no text layer): OCR first (e.g. ocrmypdf), then use this.
Complex multi-column tables: cell text is preserved and readable, but column alignment can flatten into a text flow — PyMuPDF reads a table as text blocks, not a grid, so the grid lines are gone. Tables that are images in the PDF survive as images. If the table's grid structure is essential, use doc-to-markdown (pandoc rebuilds real tables) or convert that page separately.
Pixel-perfect facsimile: output is a clean re-flow that keeps images and reading order, not a 1:1 copy of the original page layout.
Rewriting: it translates and re-lays-out; it does not summarize, add a TL;DR, or editorialize. Faithfulness is the point (see Fidelity below).

Dependencies

uv (runs Python with inline deps), Google Chrome or Chromium (visual verification). Python packages come via uv run --with: PyMuPDF, Pillow, numpy. Nothing to pre-install beyond Chrome and uv.

Workflow

Copy this checklist and tick as you go:

- [ ] 1. Extract structure + render pages   (extract_pdf.py)
- [ ] 2. Read pages/*.png — SEE the layout, find content vs decorative images
- [ ] 3. (only if translating) run the translation workflow
- [ ] 4. Build the single-file HTML          (build_html.py)
- [ ] 5. Verify visually                      (verify_render.py → Read every segment)
- [ ] 6. Deliver the .html

1. Extract

uv run --with pymupdf python scripts/extract_pdf.py input.pdf

Writes input-build/ with structure.json (text blocks with font sizes + image blocks flagged decorative), images/, and pages/ (one PNG per page).

2. Look before you build

Read input-build/pages/*.png. This is not optional: you need to see the real layout, confirm which images are content vs decoration, and spot tables/charts. For a long PDF, read every page; for a short one it's quick. This is also where you understand the document well enough to translate it well.

3. Translate (optional)

Only if the user asked for another language. Read references/translation_workflow.md and follow it: a Dynamic Workflow translates pages in parallel, captions data charts, and reconciles terminology. It produces two overlay files (units.json, caps.json) that step 4 consumes. Do not hand-translate inline for anything longer than a page — the workflow keeps terminology consistent and is far faster.

4. Build

# original-language HTML
uv run --with Pillow python scripts/build_html.py input-build/structure.json --out output.html

# translated HTML (overlays from step 3)
uv run --with Pillow python scripts/build_html.py input-build/structure.json --out output.html \
    --translation input-build/units.json --captions input-build/caps.json --lang zh-CN

build_html.py is data-driven: it infers heading levels from font size (most common size = body; larger steps up to h3/h2/h1), drops decorative images, and inlines content images as compressed base64 → one portable file. It is not hand-tuned to any document. If a particular PDF has an unusual structure (e.g. multi-column, sidebars, a figure the size heuristic misreads), read the script and adjust — it's short and meant to be edited per document.

5. Verify visually (mandatory)

uv run --with Pillow --with numpy python scripts/verify_render.py output.html

Then Read every seg-*.png and check: fonts render (no tofu boxes), no clipped tables/figures, headings/lists look right, all expected images present. Text being correct does not mean the render is correct (failure_cases #7). Fix and re-verify until it's clean.

A quick structural cross-check is fine too, but count occurrences correctly: grep -o '<figure>' output.html | wc -l — not grep -c (failure_cases #1).

6. Deliver

Hand over the single .html. It's self-contained (images inlined), so it opens with a double-click and nothing can go missing.

Scripts

Script	Run with	Purpose
`scripts/extract_pdf.py`	`uv run --with pymupdf`	PDF → structure.json + images/ + page renders
`scripts/build_html.py`	`uv run --with Pillow`	structure.json (+ optional translation/captions) → single-file HTML
`scripts/verify_render.py`	`uv run --with Pillow --with numpy`	headless-Chrome render → readable PNG segments

Fidelity (read before translating)

The deliverable looks authoritative, so wrong content is worse than ugly content. The non-negotiable rules — and the specific ways this has gone wrong before — are in references/failure_cases.md. The one that bites hardest: never give a real person an inferred translated name, and copy every number/proper-noun verbatim (failure_cases #6). Read that file before any translation run; skim it before any run.

Next Step

After producing the HTML, suggest the natural follow-up:

Conversion complete: output.html (single self-contained file).

Options:
A) Make a PDF of it — run /daymade-docs:pdf-creator if you want a print/share copy (Recommended if they need to send it)
B) Extract the text as Markdown instead — run /daymade-docs:doc-to-markdown (if they wanted editable text, not a reading page)
C) No thanks — the HTML is what I wanted

More from this repository

same repository

llm-wiki-setup

daymade/claude-code-skills

Co-create a personal investment-research LLM Wiki (Andrej Karpathy's pattern) where the user's OWN analysis framework becomes a living CLAUDE.md — by interviewing them, NOT by handing them a template. Use whenever the user wants to build a compounding research knowledge base, 投研第二大脑, 投研知识库, or 个人投研 wiki; instantiate Karpathy's LLM Wiki gist for finance/investing; turn their stock-picking, analyst-tracking, or earnings-watching workflow into a structured markdown vault; or build a wiki tracking companies / industries / macro / analysts over time. Pure markdown + wikilinks, NO RAG / vector DB (Karpathy's core idea — do not over-engineer). Also triggers for ingesting research reports / earnings calls / expert notes into an existing wiki, and for post-earnings prediction→fulfillment reviews. Core value = extracting the user's personal investment preferences into THEIR OWN schema, never imposing a standard one.

2026-06-061.2k

debugging-network-issues

daymade/claude-code-skills

Evidence-driven investigation for network, streaming, and protocol-layer bugs where symptoms don't match the obvious cause. Use when debugging connection resets (ECONNRESET, HTTP/2 RST_STREAM, INTERNAL_ERROR), SSE or long-polling stalls, fixed-time connection drops, CDN/proxy/CGNAT idle timeouts, or symptoms like "socket closed unexpectedly", "stream interrupted", "fails after N seconds", "works sometimes but not always", "upstream silent for X seconds". Applies falsification-first layered isolation to pin down the responsible network layer instead of stacking assumptions.

2026-06-061.2k

tunnel-doctor

daymade/claude-code-skills

Diagnoses and fixes conflicts between Tailscale and proxy/VPN tools (Shadowrocket, Clash, Surge, OrbStack/Docker) on macOS — route hijacking, HTTP proxy env vars, system proxy bypass, SSH ProxyCommand double-tunneling, VM/container proxy propagation, and stalled macOS DNS resolution. Use when Tailscale ping works but SSH/HTTP times out, browser returns 503 but curl works, git push fails with "failed to begin relaying via HTTP", Docker pull/build times out behind TUN/VPN, setting up Tailscale SSH to WSL, bootstrapping remote dev over Tailscale, ssh/curl/git hang ~60s before resolving a hostname while nslookup returns instantly, ping to a resolver IP works but dig to the same IP times out, ssh -vvv freezes at "debug2: resolving" without reaching "debug1: connect", or raw probes give impossibly-fast results under a TUN proxy (nc -z 0.00s, sub-ms ping to overseas nodes, or an IP-geo lookup reporting the proxy exit instead of your real home/ISP — the TUN fabricates locally).

2026-06-061.2k

marketplace-dev

daymade/claude-code-skills

Converts any Claude Code skills repository into an official plugin marketplace — generates spec-conforming .claude-plugin/marketplace.json, validates with `claude plugin validate`, tests real installation, and PRs the upstream repo, encoding hard-won schema/version/description anti-patterns. Use when the user mentions marketplace, plugin support, one-click install, marketplace.json, plugin distribution, auto-update, or wants a skills repo installable via `claude plugin install`.

2026-06-061.2k

skill-creator

daymade/claude-code-skills

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

2026-06-061.2k

skill-reviewer

daymade/claude-code-skills

Reviews and improves Claude Code skills against official best practices. Supports three modes - self-review (validate your own skills), external review (evaluate others' skills), and auto-PR (fork, improve, submit). Use when checking skill quality, reviewing skill repositories, or contributing improvements to open-source skills.

2026-06-061.2k

name

pdf-to-html

description

PDF to HTML

Turn a PDF into a single, self-contained, readable HTML file — images, tables, charts and reading order preserved — and optionally translate it, keeping every figure in place.

This skill runs inline (no context: fork): translation orchestrates a Dynamic Workflow, and a subagent cannot spawn one.

When to use / not use

Use when the goal is to read a PDF as HTML/web page, to convert a PDF to a styled HTML document, or to translate a PDF into another language while keeping its figures and tables.
doc-to-markdown instead if they want plain Markdown text (no styling, figures optional).
pdf-creator instead for the reverse direction (Markdown → PDF).

What it does NOT do

Scanned/image-only PDFs (no text layer): OCR first (e.g. ocrmypdf), then use this.
Complex multi-column tables: cell text is preserved and readable, but column alignment can flatten into a text flow — PyMuPDF reads a table as text blocks, not a grid, so the grid lines are gone. Tables that are images in the PDF survive as images. If the table's grid structure is essential, use doc-to-markdown (pandoc rebuilds real tables) or convert that page separately.
Pixel-perfect facsimile: output is a clean re-flow that keeps images and reading order, not a 1:1 copy of the original page layout.
Rewriting: it translates and re-lays-out; it does not summarize, add a TL;DR, or editorialize. Faithfulness is the point (see Fidelity below).

Dependencies

uv (runs Python with inline deps), Google Chrome or Chromium (visual verification). Python packages come via uv run --with: PyMuPDF, Pillow, numpy. Nothing to pre-install beyond Chrome and uv.

Workflow

Copy this checklist and tick as you go:

- [ ] 1. Extract structure + render pages   (extract_pdf.py)
- [ ] 2. Read pages/*.png — SEE the layout, find content vs decorative images
- [ ] 3. (only if translating) run the translation workflow
- [ ] 4. Build the single-file HTML          (build_html.py)
- [ ] 5. Verify visually                      (verify_render.py → Read every segment)
- [ ] 6. Deliver the .html

1. Extract

uv run --with pymupdf python scripts/extract_pdf.py input.pdf

Writes input-build/ with structure.json (text blocks with font sizes + image blocks flagged decorative), images/, and pages/ (one PNG per page).

2. Look before you build

3. Translate (optional)

4. Build

# original-language HTML
uv run --with Pillow python scripts/build_html.py input-build/structure.json --out output.html

# translated HTML (overlays from step 3)
uv run --with Pillow python scripts/build_html.py input-build/structure.json --out output.html \
    --translation input-build/units.json --captions input-build/caps.json --lang zh-CN

5. Verify visually (mandatory)

uv run --with Pillow --with numpy python scripts/verify_render.py output.html

A quick structural cross-check is fine too, but count occurrences correctly: grep -o '<figure>' output.html | wc -l — not grep -c (failure_cases #1).

6. Deliver

Hand over the single .html. It's self-contained (images inlined), so it opens with a double-click and nothing can go missing.

Scripts

Script	Run with	Purpose
`scripts/extract_pdf.py`	`uv run --with pymupdf`	PDF → structure.json + images/ + page renders
`scripts/build_html.py`	`uv run --with Pillow`	structure.json (+ optional translation/captions) → single-file HTML
`scripts/verify_render.py`	`uv run --with Pillow --with numpy`	headless-Chrome render → readable PNG segments

Fidelity (read before translating)

Next Step

After producing the HTML, suggest the natural follow-up:

Conversion complete: output.html (single self-contained file).

Options:
A) Make a PDF of it — run /daymade-docs:pdf-creator if you want a print/share copy (Recommended if they need to send it)
B) Extract the text as Markdown instead — run /daymade-docs:doc-to-markdown (if they wanted editable text, not a reading page)
C) No thanks — the HTML is what I wanted