| name | html-to-image |
| description | Render hand-coded HTML/CSS into crisp PNG images for architecture diagrams, blog cover art, technical posters, and any case where precise typography, layout, and brand control matter more than what a text-to-image model can deliver. Use this skill whenever the user wants a diagram, infographic, cover image, poster, or visual asset built from code (not generated by an image model), or whenever they reference past work like "another Compass-style architecture diagram" or "a header image for my blog post." Pairs with the frontend-design skill — frontend-design picks the aesthetic, this skill handles the HTML→PNG pipeline. |
HTML to image
A workflow for turning a hand-coded HTML file into a high-resolution PNG. The skill captures the mechanical parts — viewport, DPR, font-load timing, crop, dual delivery — so the conversation can stay focused on visual decisions.
Setup
scripts/render.py requires Python 3.10+ (uses X | Y union and tuple[int, int] builtin-generic typing) + the Playwright Python package + the Chromium browser binaries Playwright manages itself.
Quick check (this both imports the module and launches Chromium — if it prints ready, you're done):
python3 -c "from playwright.sync_api import sync_playwright; p=sync_playwright().start(); b=p.chromium.launch(); b.close(); p.stop(); print('ready')"
If it fails, install:
pip install playwright
python3 -m playwright install chromium
Always invoke via python3 -m playwright, not the bare playwright CLI. On systems with a global Node install, which playwright often resolves to the JavaScript Playwright CLI, which downloads browsers to a different location than Python's Playwright looks in — so playwright install chromium will appear to succeed (exit 0, no output) while the Python module still can't find Chromium. The python3 -m form routes through the Python package and writes to the right place.
Chromium binaries land in ~/.cache/ms-playwright/ by default (or $PLAYWRIGHT_BROWSERS_PATH if set — e.g. some sandboxes use /opt/pw-browsers/). The Python module knows where to look; you generally shouldn't need to touch the path yourself.
If the install prompt is interactive (user-facing skill run, not a smoke test), confirm with the user before downloading — Chromium + headless shell are ~280 MB.
When to reach for this skill
- "Draw an architecture diagram for X" / "make an infographic about Y"
- "I need a cover image for my blog post about Z"
- "Generate a poster / explainer / one-pager"
- Anything where the user previously praised a code-rendered image and wants more in that style
- Anything where text-to-image would mangle the typography or get the labels wrong
If the user only wants a live web component (no PNG export), skip this skill — use frontend-design alone.
Pair with frontend-design
This skill is the pipeline. It says nothing about visual style. Before writing HTML, consult the frontend-design skill for the aesthetic direction (typography, color, composition, avoiding generic AI looks). The split:
- frontend-design → what the image should look like
- html-to-image → how to render it crisply and deliver it
Both apply on every job.
Workflow
1. Clarify intent before drawing
For non-trivial diagrams, ask up to three clarifying questions before writing any HTML. Typical ones:
- Is the audience technical, leadership, or mixed?
- Are there parts that are "ours" vs. "ecosystem / external" that need visual separation?
- Any brand color or logo to anchor on, or stay neutral?
If the user says "just go" or doesn't answer, commit to a default and state it in one line ("Going with neutral blue-gray + a single accent; will swap if you prefer something else"). Don't stall on confirmation.
2. Set up the working directory
Iterate on the HTML inside a scratch directory (e.g. /tmp/html-to-image-work/ or the project's working area). Only copy the final files to the user's intended location at delivery. This keeps half-finished artifacts out of the way.
3. Write the HTML
A few conventions the render script depends on:
- Wrap the printable content in a
.canvas (or .container, or main) element. The render script crops to this by default, stripping browser margin / body padding so the PNG has no whitespace around the artwork.
- Load web fonts via
<link> in <head>, not @import inside CSS. Faster, and networkidle will actually wait for them.
- Inline SVG for icons. No external icon fonts (FontAwesome etc.) — they're a render-timing hazard.
- All CSS inline in
<style>. Keeps the artifact single-file and trivially shareable.
If the design uses many components, write a Python build script with f-string templates rather than one monolithic HTML — it makes iteration much faster.
4. Render
Use scripts/render.py. The script handles 2x DPR, networkidle wait, the extra 2-second wait for web fonts, and cropping.
python scripts/render.py diagram.html --preset architecture
Presets (viewport in CSS pixels; PNG is 2x):
| Preset | Viewport | When to use |
|---|
architecture | 1360×1080 | Dense layered diagrams for leadership reviews |
architecture-s | 1280×980 | Side-by-side architecture, narrower aspect |
blog-cover | 1200×675 | 16:9 banner / hero for blog posts |
blog-square | 1200×1200 | Single-concept square for social or inline use |
poster | 1440×1800 | Tall poster, conference handout, explainer |
Or pass --width W --height H for custom dimensions. --full-page disables cropping. --selector ".my-class" overrides the crop target.
5. Visual self-check
Always view the rendered PNG before reporting back. Common failures the eye catches but the script doesn't:
- Fonts didn't load (text falls back to system font and looks wrong)
- Icon SVGs misaligned or too small at print size
- Content overflows the
.canvas and gets clipped
- Whitespace asymmetry, awkward grid breaks
- Colors that look fine on screen but lose contrast at 2x
If any issue: edit HTML, re-render, re-view. Two or three iterations is normal.
6. Deliver both files
Hand over both the PNG (high-res, drop straight into slides / blog / docs) and the HTML (editable source for the next round of changes). Mention this dual delivery in the wrap-up so the user knows the HTML is theirs to keep tweaking.
Common gotchas
Fonts paint after networkidle. Networkidle fires when network is quiet, but Google Fonts often paints a frame or two later. The script waits an extra 2000ms. If a render still shows fallback fonts, bump the wait — don't remove it.
full_page=True includes body padding. This is the wrong default for diagrams. Always crop to .canvas / .container / main (the script does this automatically). Only use --full-page for full-bleed designs.
1x screenshots look soft. Always use device_scale_factor=2 (baked into the script). The PNG ends up at 2× CSS dimensions — e.g., a 1360-wide viewport produces a 2720-wide PNG. That's what makes it look crisp when embedded in slides at 100% zoom or viewed on retina.
SVG icons drift at small sizes. Inline SVG with viewBox="0 0 N N" and explicit width/height in CSS, not attribute sizes. Stroke widths look thicker at 2x than they did during design — verify in the PNG, not the browser.
Background color of the page matters. If the design is light, set body { background: #fff } (or the canvas's intended outer color). When cropped to .canvas, the body color won't show, but it prevents transparent edges if you ever switch to --full-page.
Style guidance lives in frontend-design
This skill intentionally has no opinions about color palettes, typography pairings, or aesthetic direction. Those decisions are case-by-case and the frontend-design skill covers them well. Resist the urge to copy the visual style of a previous diagram unless the user explicitly asks for it — diagrams that all look the same are a sign the skill has been over-prescribed.