Run any Skill in Manus with one click

$pwd:

html-get

Name: Html Get
Author: microlinkhq

// Retrieve normalized HTML from URLs with fetch or headless prerender for JS pages, absolute URL rewriting, and metadata extraction pipelines.

Run Skill in Manus

$ git log --oneline --stat

stars:1

forks:0

updated:May 26, 2026 at 13:26

SKILL.md

readonly

name	html-get
description	Retrieve normalized HTML from URLs with fetch or headless prerender for JS pages, absolute URL rewriting, and metadata extraction pipelines.

html-get

html-get returns reliable HTML for a URL, choosing fetch or prerender depending on page needs.

Quick Start

Install:

npm install html-get browserless puppeteer

Minimal usage:

const createBrowserless = require('browserless')
const getHTML = require('html-get')

const browser = createBrowserless()
const context = browser.createContext()

const result = await getHTML('https://example.com', {
  getBrowserless: () => context
})

console.log(result.html)

await context((browserless) => browserless.destroyContext())
await browser.close()

Recommended Workflow

Start with default prerender: 'auto'.
Set prerender: false for static pages when speed is priority.
Enable rewriteUrls: true when downstream parsing needs absolute links.
Enable rewriteHtml: true when source pages have broken meta tags.
Reuse one browser process and create/destroy contexts per request.

CLI

One-off usage:

npx -y html-get https://example.com

Debug output with mode, timing, and headers:

npx -y html-get https://example.com --debug

Core Options

getBrowserless (function): required unless prerender: false.
prerender ('auto' | true | false): mode selector.
rewriteUrls (boolean): rewrite relative HTML/CSS URLs to absolute.
rewriteHtml (boolean): normalize common meta-tag mistakes.
headers (object): request headers for fetch/prerender.
gotOpts (object): extra options for got in fetch mode.
puppeteerOpts (object): options passed to browserless evaluate flow.
serializeHtml (function): custom output serializer from Cheerio instance.
encoding (string): output encoding, default utf-8.

Output Shape

getHTML(url, opts) resolves to:

html: serialized HTML (or custom serializer output fields).
url: final URL.
statusCode: HTTP status.
headers: response headers.
redirects: redirect chain.
stats: { mode, timing }.

Common Patterns

Force fast fetch mode for known static targets:

const result = await getHTML(url, {
  prerender: false,
  rewriteUrls: true
})

Prepare HTML for metadata extraction:

const page = await getHTML(url, {
  getBrowserless,
  rewriteUrls: true,
  rewriteHtml: true
})

const metadata = await metascraper({ url: page.url, html: page.html })

Custom serializer (avoid returning full HTML):

const result = await getHTML(url, {
  getBrowserless,
  serializeHtml: ($) => ({
    html: $.html(),
    title: $('title').first().text()
  })
})

Reliability Notes

If getBrowserless is missing and prerender is not false, html-get throws.
PDF URLs are fetched and can be converted via mutool when available.
Media URLs are normalized to HTML wrappers (img, video, audio) for consistent downstream parsing.
For large batch jobs, control concurrency outside html-get and always clean up browser contexts.

related-skills.json

same repository

browserless.md

from "microlinkhq/skills"

Automate browserless/Puppeteer headless Chrome for screenshots, PDFs, HTML/text extraction, status checks, Lighthouse audits, and browser pipelines.

2026-05-261

create-local-skill.md

from "microlinkhq/skills"

Create project-local skills for Cursor and Claude Code when users ask to create, add, or update reusable repo instructions.

2026-05-261

k8s-hpa-cost-tuning.md

from "microlinkhq/skills"

Tune Kubernetes HPA, topology spread, requests, and scale-down behavior for cluster cost audits, incidents, replica/node issues, and over-reservation.

2026-05-261

keyvhq.md

from "microlinkhq/skills"

Build @keyvhq/core key-value caches with TTL, namespaces, memoization, cache-aside patterns, and Redis/Mongo/MySQL/PostgreSQL/SQLite adapters.

2026-05-261

metascraper.md

from "microlinkhq/skills"

Extract metadata from HTML with metascraper rules for link previews, Open Graph, Twitter Cards, JSON-LD, titles, images, authors, and custom parsers.

2026-05-261

microlink-api.md

from "microlinkhq/skills"

Use Microlink API/MQL to extract URL metadata, build link previews, capture screenshots/PDFs, scrape CSS-selected data, and avoid browser infrastructure.

2026-05-261

package.json

"author": "microlinkhq"

"repository": "microlinkhq/skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	html-get
description	Retrieve normalized HTML from URLs with fetch or headless prerender for JS pages, absolute URL rewriting, and metadata extraction pipelines.

html-get

html-get returns reliable HTML for a URL, choosing fetch or prerender depending on page needs.

Quick Start

Install:

npm install html-get browserless puppeteer

Minimal usage:

const createBrowserless = require('browserless')
const getHTML = require('html-get')

const browser = createBrowserless()
const context = browser.createContext()

const result = await getHTML('https://example.com', {
  getBrowserless: () => context
})

console.log(result.html)

await context((browserless) => browserless.destroyContext())
await browser.close()

Recommended Workflow

Start with default prerender: 'auto'.
Set prerender: false for static pages when speed is priority.
Enable rewriteUrls: true when downstream parsing needs absolute links.
Enable rewriteHtml: true when source pages have broken meta tags.
Reuse one browser process and create/destroy contexts per request.

CLI

One-off usage:

npx -y html-get https://example.com

Debug output with mode, timing, and headers:

npx -y html-get https://example.com --debug

Core Options

getBrowserless (function): required unless prerender: false.
prerender ('auto' | true | false): mode selector.
rewriteUrls (boolean): rewrite relative HTML/CSS URLs to absolute.
rewriteHtml (boolean): normalize common meta-tag mistakes.
headers (object): request headers for fetch/prerender.
gotOpts (object): extra options for got in fetch mode.
puppeteerOpts (object): options passed to browserless evaluate flow.
serializeHtml (function): custom output serializer from Cheerio instance.
encoding (string): output encoding, default utf-8.

Output Shape

getHTML(url, opts) resolves to:

html: serialized HTML (or custom serializer output fields).
url: final URL.
statusCode: HTTP status.
headers: response headers.
redirects: redirect chain.
stats: { mode, timing }.

Common Patterns

Force fast fetch mode for known static targets:

const result = await getHTML(url, {
  prerender: false,
  rewriteUrls: true
})

Prepare HTML for metadata extraction:

const page = await getHTML(url, {
  getBrowserless,
  rewriteUrls: true,
  rewriteHtml: true
})

const metadata = await metascraper({ url: page.url, html: page.html })

Custom serializer (avoid returning full HTML):

const result = await getHTML(url, {
  getBrowserless,
  serializeHtml: ($) => ({
    html: $.html(),
    title: $('title').first().text()
  })
})

Reliability Notes

If getBrowserless is missing and prerender is not false, html-get throws.
PDF URLs are fetched and can be converted via mutool when available.
Media URLs are normalized to HTML wrappers (img, video, audio) for consistent downstream parsing.
For large batch jobs, control concurrency outside html-get and always clean up browser contexts.

html-get

html-get

Quick Start

Recommended Workflow

CLI

Core Options

Output Shape

Common Patterns

Reliability Notes

More from this repository

More from this repository

html-get

Quick Start

Recommended Workflow

CLI

Core Options

Output Shape

Common Patterns

Reliability Notes