with one click
metascraper
// Extract metadata from HTML with metascraper rules for link previews, Open Graph, Twitter Cards, JSON-LD, titles, images, authors, and custom parsers.
// Extract metadata from HTML with metascraper rules for link previews, Open Graph, Twitter Cards, JSON-LD, titles, images, authors, and custom parsers.
Automate browserless/Puppeteer headless Chrome for screenshots, PDFs, HTML/text extraction, status checks, Lighthouse audits, and browser pipelines.
Create project-local skills for Cursor and Claude Code when users ask to create, add, or update reusable repo instructions.
Retrieve normalized HTML from URLs with fetch or headless prerender for JS pages, absolute URL rewriting, and metadata extraction pipelines.
Tune Kubernetes HPA, topology spread, requests, and scale-down behavior for cluster cost audits, incidents, replica/node issues, and over-reservation.
Build @keyvhq/core key-value caches with TTL, namespaces, memoization, cache-aside patterns, and Redis/Mongo/MySQL/PostgreSQL/SQLite adapters.
Use Microlink API/MQL to extract URL metadata, build link previews, capture screenshots/PDFs, scrape CSS-selected data, and avoid browser infrastructure.
| name | metascraper |
| description | Extract metadata from HTML with metascraper rules for link previews, Open Graph, Twitter Cards, JSON-LD, titles, images, authors, and custom parsers. |
metascraper extracts normalized metadata from the input HTML you provide, using ordered rule bundles.
Extraction quality depends on the accuracy of that input HTML.
Install only what you need:
npm install metascraper \
metascraper-author \
metascraper-date \
metascraper-description \
metascraper-image \
metascraper-logo \
metascraper-publisher \
metascraper-title \
metascraper-url
Minimal extraction:
const metascraper = require('metascraper')([
require('metascraper-author')(),
require('metascraper-date')(),
require('metascraper-description')(),
require('metascraper-image')(),
require('metascraper-logo')(),
require('metascraper-publisher')(),
require('metascraper-title')(),
require('metascraper-url')()
])
const metadata = await metascraper({
url: 'https://example.com/article',
html: '<html>...</html>'
})
metascraper({ url, html }).pickPropNames when speed matters.metascraper needs both url and html.
html-get).For accurate rendered markup, use the local html-get skill before running metascraper.
Create an instance:
const metascraper = require('metascraper')([/* rule bundles */])
Run extraction:
const metadata = await metascraper({
url,
html,
htmlDom, // optional pre-parsed DOM
rules, // optional extra rules at runtime
pickPropNames, // optional Set of fields to compute
omitPropNames, // optional Set of fields to skip
validateUrl // default true
})
Compute only selected fields:
const metadata = await metascraper({
url,
html,
pickPropNames: new Set(['title', 'description', 'image'])
})
Skip expensive or unwanted fields:
const metadata = await metascraper({
url,
html,
omitPropNames: new Set(['logo'])
})
When both are present, pickPropNames takes precedence.
Example with bundle options:
const metascraper = require('metascraper')([
require('metascraper-logo')({
filter: imageUrl => imageUrl.endsWith('.png')
})
])
Core article fields:
metascraper-authormetascraper-datemetascraper-descriptionmetascraper-imagemetascraper-logometascraper-publishermetascraper-titlemetascraper-urlMedia and enrichment:
metascraper-audiometascraper-videometascraper-langmetascraper-iframemetascraper-readabilityProvider-specific examples:
metascraper-youtubemetascraper-spotifymetascraper-soundcloudmetascraper-xmetascraper-instagrammetascraper-tiktokurl is absolute and matches the fetched page.pickPropNames to compute only needed fields.validateUrl: false only when input is intentionally non-standard.Common fields include:
titledescriptionauthorpublisherdateimagelogourllangaudiovideoThe returned shape depends on loaded rule bundles.