with one click
html-get
// Retrieve normalized HTML from URLs with fetch or headless prerender for JS pages, absolute URL rewriting, and metadata extraction pipelines.
// Retrieve normalized HTML from URLs with fetch or headless prerender for JS pages, absolute URL rewriting, and metadata extraction pipelines.
Automate browserless/Puppeteer headless Chrome for screenshots, PDFs, HTML/text extraction, status checks, Lighthouse audits, and browser pipelines.
Create project-local skills for Cursor and Claude Code when users ask to create, add, or update reusable repo instructions.
Tune Kubernetes HPA, topology spread, requests, and scale-down behavior for cluster cost audits, incidents, replica/node issues, and over-reservation.
Build @keyvhq/core key-value caches with TTL, namespaces, memoization, cache-aside patterns, and Redis/Mongo/MySQL/PostgreSQL/SQLite adapters.
Extract metadata from HTML with metascraper rules for link previews, Open Graph, Twitter Cards, JSON-LD, titles, images, authors, and custom parsers.
Use Microlink API/MQL to extract URL metadata, build link previews, capture screenshots/PDFs, scrape CSS-selected data, and avoid browser infrastructure.
| name | html-get |
| description | Retrieve normalized HTML from URLs with fetch or headless prerender for JS pages, absolute URL rewriting, and metadata extraction pipelines. |
html-get returns reliable HTML for a URL, choosing fetch or prerender depending on page needs.
Install:
npm install html-get browserless puppeteer
Minimal usage:
const createBrowserless = require('browserless')
const getHTML = require('html-get')
const browser = createBrowserless()
const context = browser.createContext()
const result = await getHTML('https://example.com', {
getBrowserless: () => context
})
console.log(result.html)
await context((browserless) => browserless.destroyContext())
await browser.close()
prerender: 'auto'.prerender: false for static pages when speed is priority.rewriteUrls: true when downstream parsing needs absolute links.rewriteHtml: true when source pages have broken meta tags.One-off usage:
npx -y html-get https://example.com
Debug output with mode, timing, and headers:
npx -y html-get https://example.com --debug
getBrowserless (function): required unless prerender: false.prerender ('auto' | true | false): mode selector.rewriteUrls (boolean): rewrite relative HTML/CSS URLs to absolute.rewriteHtml (boolean): normalize common meta-tag mistakes.headers (object): request headers for fetch/prerender.gotOpts (object): extra options for got in fetch mode.puppeteerOpts (object): options passed to browserless evaluate flow.serializeHtml (function): custom output serializer from Cheerio instance.encoding (string): output encoding, default utf-8.getHTML(url, opts) resolves to:
html: serialized HTML (or custom serializer output fields).url: final URL.statusCode: HTTP status.headers: response headers.redirects: redirect chain.stats: { mode, timing }.Force fast fetch mode for known static targets:
const result = await getHTML(url, {
prerender: false,
rewriteUrls: true
})
Prepare HTML for metadata extraction:
const page = await getHTML(url, {
getBrowserless,
rewriteUrls: true,
rewriteHtml: true
})
const metadata = await metascraper({ url: page.url, html: page.html })
Custom serializer (avoid returning full HTML):
const result = await getHTML(url, {
getBrowserless,
serializeHtml: ($) => ({
html: $.html(),
title: $('title').first().text()
})
})
getBrowserless is missing and prerender is not false, html-get throws.mutool when available.img, video, audio) for consistent downstream parsing.html-get and always clean up browser contexts.