with one click
xcrawl-crawl
// Use this skill for XCrawl crawl tasks, including bulk site crawling, crawler rule design, async status polling, and delivery of crawl output for downstream scrape and search workflows.
// Use this skill for XCrawl crawl tasks, including bulk site crawling, crawler rule design, async status polling, and delivery of crawl output for downstream scrape and search workflows.
Use this skill as the default XCrawl entry point for direct XCrawl requests, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema.
Use this skill for XCrawl map tasks, including site URL discovery, regex filtering, scope estimation, and crawl planning before full-site crawling.
Use this skill for XCrawl scrape tasks, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema.
Use this skill for XCrawl search tasks, including keyword search request design, location and language controls, result analysis, and follow-up crawl or scrape planning.
| name | xcrawl-crawl |
| description | Use this skill for XCrawl crawl tasks, including bulk site crawling, crawler rule design, async status polling, and delivery of crawl output for downstream scrape and search workflows. |
| allowed-tools | Bash(curl:*) Bash(node:*) Read Write Edit Grep |
| metadata | {"version":"1.0.2","openclaw":{"skillKey":"xcrawl-crawl","homepage":"https://www.xcrawl.com/","requires":{"localFiles":["~/.xcrawl/config.json"],"anyBins":["curl","node"]},"apiKeySource":"local_config"}} |
This skill orchestrates full-site or scoped crawling with XCrawl Crawl APIs. Default behavior is raw passthrough: return upstream API response bodies as-is.
Before using this skill, the user must create a local config file and write XCRAWL_API_KEY into it.
Path: ~/.xcrawl/config.json
{
"XCRAWL_API_KEY": "<your_api_key>"
}
Read API key from local config file only. Do not require global environment variables.
Using XCrawl APIs consumes credits.
If the user does not have an account or available credits, guide them to register at https://dash.xcrawl.com/.
After registration, they can activate the free 1000 credits plan before running requests.
Request runtime permissions for curl and node only.
Do not request Python, shell helper scripts, or other runtime permissions.
POST /v1/crawlGET /v1/crawl/{crawl_id}https://run.xcrawl.comAuthorization: Bearer <XCRAWL_API_KEY>API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"
CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/crawl" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{"url":"https://example.com","crawler":{"limit":100,"max_depth":2},"output":{"formats":["markdown","links"]}}')"
echo "$CREATE_RESP"
CRAWL_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.crawl_id||"")' "$CREATE_RESP")"
curl -sS -X GET "https://run.xcrawl.com/v1/crawl/${CRAWL_ID}" \
-H "Authorization: Bearer ${API_KEY}"
node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",crawler:{limit:300,max_depth:3,include:["/docs/.*"],exclude:["/blog/.*"]},request:{locale:"ja-JP"},output:{formats:["markdown","links","json"]}};
fetch("https://run.xcrawl.com/v1/crawl",{
method:"POST",
headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'
POST https://run.xcrawl.com/v1/crawlContent-Type: application/jsonAuthorization: Bearer <api_key>| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | - | Site entry URL |
crawler | object | No | - | Crawler config |
proxy | object | No | - | Proxy config |
request | object | No | - | Request config |
js_render | object | No | - | JS rendering config |
output | object | No | - | Output config |
webhook | object | No | - | Async callback config |
crawler| Field | Type | Required | Default | Description |
|---|---|---|---|---|
limit | integer | No | 100 | Max pages |
include | string[] | No | - | Include only matching URLs (regex supported) |
exclude | string[] | No | - | Exclude matching URLs (regex supported) |
max_depth | integer | No | 3 | Max depth from entry URL |
include_entire_domain | boolean | No | false | Crawl full site instead of only subpaths |
include_subdomains | boolean | No | false | Include subdomains |
include_external_links | boolean | No | false | Include external links |
sitemaps | boolean | No | true | Use site sitemap |
proxy| Field | Type | Required | Default | Description |
|---|---|---|---|---|
location | string | No | US | ISO-3166-1 alpha-2 country code, e.g. US / JP / SG |
sticky_session | string | No | Auto-generated | Sticky session ID; same ID attempts to reuse exit |
request| Field | Type | Required | Default | Description |
|---|---|---|---|---|
locale | string | No | en-US,en;q=0.9 | Affects Accept-Language |
device | string | No | desktop | desktop / mobile; affects UA and viewport |
cookies | object map | No | - | Cookie key/value pairs |
headers | object map | No | - | Header key/value pairs |
only_main_content | boolean | No | true | Return main content only |
block_ads | boolean | No | true | Attempt to block ad resources |
skip_tls_verification | boolean | No | true | Skip TLS verification |
js_render| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | No | true | Enable browser rendering |
wait_until | string | No | load | load / domcontentloaded / networkidle |
viewport.width | integer | No | - | Viewport width (desktop 1920, mobile 402) |
viewport.height | integer | No | - | Viewport height (desktop 1080, mobile 874) |
output| Field | Type | Required | Default | Description |
|---|---|---|---|---|
formats | string[] | No | ["markdown"] | Output formats |
screenshot | string | No | viewport | full_page / viewport (only if formats includes screenshot) |
json.prompt | string | No | - | Extraction prompt |
json.json_schema | object | No | - | JSON Schema |
output.formats enum:
htmlraw_htmlmarkdownlinkssummaryscreenshotjsonwebhook| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | No | - | Callback URL |
headers | object map | No | - | Custom callback headers |
events | string[] | No | ["started","completed","failed"] | Events: started / completed / failed |
POST /v1/crawl)| Field | Type | Description |
|---|---|---|
crawl_id | string | Task ID |
endpoint | string | Always crawl |
version | string | Version |
status | string | Always pending |
GET /v1/crawl/{crawl_id})| Field | Type | Description |
|---|---|---|
crawl_id | string | Task ID |
endpoint | string | Always crawl |
version | string | Version |
status | string | pending / crawling / completed / failed |
url | string | Entry URL |
data | object[] | Per-page result array |
started_at | string | Start time (ISO 8601) |
ended_at | string | End time (ISO 8601) |
total_credits_used | integer | Total credits used |
data[] fields follow output.formats:
html, raw_html, markdown, links, summary, screenshot, jsonmetadata (page metadata)traffic_bytescredits_usedcredits_detailcrawl_id, initial status, and request payload.GET /v1/crawl/{crawl_id} until terminal state.pending, crawling, completed, or failed.Return:
POST /v1/crawl + GET /v1/crawl/{crawl_id})request_payload used for the create requestDo not generate summaries unless the user explicitly requests a summary.