Run any Skill in Manus with one click

mapping-urls

Stars5

Forks0

UpdatedJune 25, 2026 at 12:51

Use when the user wants the list of URLs on a site rather than the page content — sitemap analysis, link planning, or seeding another tool. Covers `crawlberg map <url>` with `--limit`, `--search`, robots, output, and how it differs from a full crawl.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

xberg-io

xberg-io/plugins

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

SKILL.md

readonly

name	mapping-urls
description	Use when the user wants the list of URLs on a site rather than the page content — sitemap analysis, link planning, or seeding another tool. Covers `crawlberg map <url>` with `--limit`, `--search`, robots, output, and how it differs from a full crawl.

Mapping URLs

crawlberg map <url> discovers the URLs a site exposes without rendering or extracting any page content. It reads sitemap.xml (including nested sitemaps), then falls back to link extraction from the seed page. Use it to plan a crawl, audit a site's surface, or feed a URL list into another tool.

Quick recipe

crawlberg map https://example.com --limit 500 --search docs --format markdown

Markdown output prints one URL per line — convenient to pipe into a file or a follow-up crawl. JSON output (default) returns a structured MapResult.

Flag surface

Flag	Default	Purpose
`--limit`	—	Maximum number of URLs to return. Unbounded if unset.
`--search`	—	Case-insensitive substring filter on discovered URLs.
`--respect-robots-txt`	off	Honour `robots.txt`. Pass it for any third-party host.
`--format`	`json`	`json` (full `MapResult`) or `markdown` (one URL per line).
`--timeout`	`30000`	Per-request timeout in ms.
`--browser-mode`	`auto`	`auto`, `always`, `never` — see the headless-fallback skill.
`--browser-endpoint`	—	External CDP `ws://` URL.
`--config`	—	Inline JSON or `@file.json` for the full `CrawlConfig`.

map takes a single seed URL positionally. There is no --depth or --max-pages here — those bound a crawl, not a map. Scope is the seed host's sitemaps plus links found on the seed page; bound the result with --limit and narrow it with --search.

How discovery works

Fetch and parse sitemap.xml, following nested <sitemapindex> entries.
If no sitemap (or a thin one), extract links from the seed page's HTML.
Apply the --search substring filter (case-insensitive), then --limit.

No page bodies are rendered, so a map of hundreds of URLs returns in seconds — far cheaper than crawling. In --browser-mode auto the seed fetch still falls back to headless Chrome if the seed page is a JS shell that hides its links; pass --browser-mode never to keep it static-only.

Output

Markdown mode

https://example.com/
https://example.com/docs/
https://example.com/docs/getting-started
https://example.com/blog/post-one

JSON mode

Top-level MapResult with a urls array; each entry carries the discovered url. Read result.urls[i].url for each string when scripting.

crawlberg map https://example.com --format json | jq -r '.urls[].url'

Common patterns

Discover then crawl a subsection

crawlberg map https://example.com --search /docs/ --format markdown > urls.txt

Feed the filtered list into a bounded crawl, or scrape individual entries.

Audit a third-party site politely

crawlberg map https://unknown.example --respect-robots-txt --limit 200

When to reach for crawl instead

If the user needs the page content (Markdown, metadata, tables) rather than just the URL list, use crawlberg crawl — see the crawling-a-site skill. Reach for map first when the goal is enumeration, planning, or seeding.

More from this repository

same repository

automating-the-browser

xberg-io/plugins

Use when extracting a page needs scripted interaction first — click, type, press a key, scroll, wait, screenshot, or run JS before capturing the DOM. Covers `crawlberg interact <url> --actions` with the real action schema, result shape, limits, and external-CDP options.

2026-06-255

crawlberg

xberg-io/plugins

Crawl, scrape, and convert websites to Markdown using the local crawlberg CLI and its MCP server. Use when the user wants to fetch a page, follow links across a domain, enumerate URLs, or drive a real browser. Covers installation, the subcommands (scrape, crawl, map, interact, mcp, serve), output formats (JSON + Markdown), browser fallback, and when to prefer the MCP server over shelling out.

2026-06-255

crawling-a-site

xberg-io/plugins

Use when the user wants to follow links across a domain and capture every reachable page as Markdown. Covers `crawlberg crawl` with depth, page caps, concurrency, rate limiting, domain scoping, robots, and output selection.

2026-06-255

headless-fallback

xberg-io/plugins

Use when a static fetch returns nothing useful and the page needs a real browser. Covers `--browser-mode auto|always|never`, external CDP via `--browser-endpoint`, symptoms of JS-only pages and WAF blocks, and the performance cost.

2026-06-255

scraping-html-to-markdown

xberg-io/plugins

Use when the user wants a single page rendered as clean Markdown plus structured metadata. Covers `crawlberg scrape <url>`, JSON vs Markdown output, what metadata is returned, and how to handle JS-heavy pages.

2026-06-255

serving-the-api

xberg-io/plugins

Use when the user wants a long-running HTTP service for scrape/crawl/map instead of one-shot CLI calls or the MCP server — for example wiring crawlberg into other apps over REST. Covers `crawlberg serve`, the Firecrawl-v1-compatible endpoints, `--host`/`--port`, and when to prefer it.

2026-06-255

name	mapping-urls
description	Use when the user wants the list of URLs on a site rather than the page content — sitemap analysis, link planning, or seeding another tool. Covers `crawlberg map <url>` with `--limit`, `--search`, robots, output, and how it differs from a full crawl.

Mapping URLs

Quick recipe

crawlberg map https://example.com --limit 500 --search docs --format markdown

Markdown output prints one URL per line — convenient to pipe into a file or a follow-up crawl. JSON output (default) returns a structured MapResult.

Flag surface

Flag	Default	Purpose
`--limit`	—	Maximum number of URLs to return. Unbounded if unset.
`--search`	—	Case-insensitive substring filter on discovered URLs.
`--respect-robots-txt`	off	Honour `robots.txt`. Pass it for any third-party host.
`--format`	`json`	`json` (full `MapResult`) or `markdown` (one URL per line).
`--timeout`	`30000`	Per-request timeout in ms.
`--browser-mode`	`auto`	`auto`, `always`, `never` — see the headless-fallback skill.
`--browser-endpoint`	—	External CDP `ws://` URL.
`--config`	—	Inline JSON or `@file.json` for the full `CrawlConfig`.

How discovery works

Fetch and parse sitemap.xml, following nested <sitemapindex> entries.
If no sitemap (or a thin one), extract links from the seed page's HTML.
Apply the --search substring filter (case-insensitive), then --limit.

Output

Markdown mode

https://example.com/
https://example.com/docs/
https://example.com/docs/getting-started
https://example.com/blog/post-one

JSON mode

Top-level MapResult with a urls array; each entry carries the discovered url. Read result.urls[i].url for each string when scripting.

crawlberg map https://example.com --format json | jq -r '.urls[].url'

Common patterns

Discover then crawl a subsection

crawlberg map https://example.com --search /docs/ --format markdown > urls.txt

Feed the filtered list into a bounded crawl, or scrape individual entries.

Audit a third-party site politely

crawlberg map https://unknown.example --respect-robots-txt --limit 200