with one click
nimble-extract-reference
Reference for nimble extract command. Load when fetching URLs or scraping pages. Contains: render tiers 1-3, all flags, browser actions, network capture, parser schemas, geo targeting, async, parallelization.
Menu
Reference for nimble extract command. Load when fetching URLs or scraping pages. Contains: render tiers 1-3, all flags, browser actions, network capture, parser schemas, geo targeting, async, parallelization.
Based on SOC occupation classification
| name | nimble-extract-reference |
| description | Reference for nimble extract command. Load when fetching URLs or scraping pages. Contains: render tiers 1-3, all flags, browser actions, network capture, parser schemas, geo targeting, async, parallelization. |
Fetches a URL and returns its content. The workhorse command — use for any URL where no agent exists.
| Parameter | CLI flag | Type | Default | Description |
|---|---|---|---|---|
url | --url | string | — | Target URL (required) |
render | --render | bool | false | Enable headless browser (JS rendering) |
driver | --driver | string | vx6 | Engine: vx6 · vx8 · vx8-pro · vx10 · vx10-pro — see Drivers table |
formats | --format | array | ["html"] | Output format(s): "html", "markdown". CLI: string (--format markdown). SDK: array (formats=["markdown"]). |
parse | --parse | bool | false | Enable parser (use with parser) |
parser | --parser | JSON | — | Extraction schema — see parsing-schema.md |
browser_actions | --browser-action | JSON | — | Browser actions sequence — see browser-actions.md |
network_capture | --network-capture | JSON | — | XHR intercept rules — see network-capture.md |
is_xhr | --is-xhr | bool | false | Direct API call — no browser, no render |
country | --country | string | — | ISO Alpha-2 geo proxy (e.g. US, GB) |
state | --state | string | — | State-level geo targeting |
city | --city | string | — | City-level geo targeting |
locale | --locale | string | — | LCID locale (e.g. en-US, fr-FR) — pair with country |
method | --method | string | GET | HTTP method: GET, POST, PUT, PATCH, DELETE |
tag | --tag | string | — | Request tag for analytics |
| Driver | Description | Render | Best for |
|---|---|---|---|
vx6 | Fast HTTP (no JS) | No | Static HTML, APIs, high volume |
vx8 | Headless browser | Yes | Dynamic sites, SPAs |
vx8-pro | Headful browser | Yes | Complex interactions |
vx10 | Stealth headless | Yes | Bot-protected sites |
vx10-pro | Stealth headful | Yes | Most protected sites |
# Markdown output (default for most tasks)
nimble --transform "data.markdown" extract \
--url "https://example.com/page" --format markdown
# Save to file
nimble --transform "data.markdown" extract \
--url "https://example.com/page" --format markdown > .nimble/page.md
from nimble_python import Nimble
nimble = Nimble()
resp = nimble.extract(url="https://example.com/page", formats=["markdown"])
print(resp["data"]["markdown"])
Failure signals: status 500 · empty data.html / data.markdown · "captcha" / "verify you are human" in content · login wall instead of target page
| Tier | CLI | When |
|---|---|---|
| 1 | extract --url "..." | Static pages, docs, news, GitHub, Wikipedia, HN |
| 2 | extract --url "..." --render | SPAs, dynamic content, JS-rendered pages |
| 2b | --render --render-options '{"render_type":"idle2","timeout":60000}' | Slow SPAs, wait for network idle |
| 3 | --render --driver vx10-pro | E-commerce, social, job boards — max stealth |
# Tier 1 — no render
nimble --transform "data.markdown" extract --url "https://example.com" --format markdown
# Tier 2 — render
nimble --transform "data.markdown" extract --url "https://example.com" --render --format markdown
# Tier 3 — stealth
nimble --transform "data.markdown" extract --url "https://example.com" --render --driver vx10-pro --format markdown
# Tier 2 — render
resp = nimble.extract(url="https://example.com", render=True, formats=["markdown"])
# Tier 3 — stealth
resp = nimble.extract(url="https://example.com", render=True, driver="vx10-pro", formats=["markdown"])
For interacting with a page before extracting — clicks, scrolls, form fills, infinite scroll. Requires render=True.
See browser-actions.md for all action types and params.
nimble --transform "data.markdown" extract \
--url "https://example.com/product" --render \
--browser-action '[
{"type": "click", "selector": ".tab-reviews", "required": false},
{"type": "wait_for_element", "selector": ".review-list"}
]' --format markdown
resp = nimble.extract(
url="https://example.com/product",
render=True,
browser_actions=[
{"type": "click", "selector": ".tab-reviews", "required": False},
{"type": "wait_for_element", "selector": ".review-list"},
],
formats=["markdown"],
)
When page data comes from XHR/AJAX calls, or to call a known API endpoint directly with --is-xhr.
See network-capture.md for filter syntax and --is-xhr mode.
# Intercept an API call triggered by the page
nimble extract \
--url "https://example.com/products" --render \
--network-capture '[{"url": {"type": "contains", "value": "/api/products"}, "resource_type": ["xhr", "fetch"]}]' \
> .nimble/products-api.json
# Known public API endpoint — use --is-xhr (no browser, fastest)
nimble --transform "data.markdown" extract \
--url "https://api.example.com/v1/markets?q=elections&limit=50" \
--is-xhr --format markdown
# Intercept via render
resp = nimble.extract(
url="https://example.com/products",
render=True,
network_capture=[{"url": {"type": "contains", "value": "/api/products"}, "resource_type": ["xhr", "fetch"]}],
)
captures = resp["data"]["network_capture"]
# Direct API call — no browser
resp = nimble.extract(
url="https://api.example.com/v1/markets?q=elections&limit=50",
is_xhr=True,
)
Note:
is_xhrandrenderare mutually exclusive.
When markdown doesn't contain fields cleanly. Results land in data.parsing.
See parsing-schema.md for selector types, extractors, and post-processors.
nimble extract --url "https://example.com/product" --render --parse \
--parser '{
"type": "schema",
"fields": {
"title": {"type": "terminal", "selector": {"type": "css", "css_selector": "h1"}, "extractor": {"type": "text"}},
"price": {"type": "terminal", "selector": {"type": "css", "css_selector": "[data-price]"}, "extractor": {"type": "text"}}
}
}'
resp = nimble.extract(
url="https://example.com/product",
render=True,
parse=True,
parser={
"type": "schema",
"fields": {
"title": {"type": "terminal", "selector": {"type": "css", "css_selector": "h1"}, "extractor": {"type": "text"}},
"price": {"type": "terminal", "selector": {"type": "css", "css_selector": "[data-price]"}, "extractor": {"type": "text"}},
},
},
)
print(resp["data"]["parsing"])
# Country
nimble --transform "data.markdown" extract --url "https://example.com" --country US --format markdown
# City-level
nimble --transform "data.markdown" extract --url "https://example.com" --country US --state CA --city los_angeles --format markdown
# Localized (pair --locale with --country)
nimble --transform "data.markdown" extract --url "https://example.com/fr" --country FR --locale fr-FR --format markdown
resp = nimble.extract(url="https://example.com", country="US", formats=["markdown"])
resp = nimble.extract(url="https://example.com", country="US", state="CA", city="los_angeles", formats=["markdown"])
For batch processing or long-running extractions. Returns immediately with a task ID; poll for results.
Additional async-only params:
| Parameter | CLI flag | Type | Description |
|---|---|---|---|
storage_type | --storage-type | string | Cloud provider: s3 or gs |
storage_url | --storage-url | string | Destination (e.g. s3://my-bucket/path/) |
compress | --compress | bool | GZIP compress results before storing |
custom_name | --custom-name | string | Custom filename (default: task ID) |
callback_url | --callback-url | string | Webhook URL — called on completion |
Task states: pending → running → success / failed
# Submit async
nimble extract-async --url "https://example.com/page" --render --format markdown
# Poll status
nimble tasks get --task-id <task_id>
# Fetch results
nimble tasks results --task-id <task_id>
import asyncio
from nimble_python import AsyncNimble
async def extract():
nimble = AsyncNimble()
task = await nimble.extract_async(url="https://example.com/page", render=True, formats=["markdown"])
task_id = task["task"]["id"]
while True:
status = await nimble.tasks.get(task_id=task_id)
state = status["task"]["state"]
if state in ("success", "failed"):
break
await asyncio.sleep(5)
result = await nimble.tasks.results(task_id=task_id)
print(result["data"]["markdown"])
asyncio.run(extract())
Submit up to 1,000 URLs in a single request. Uses an inputs + shared_inputs pattern
— shared config applies to all items, per-item values override.
Parameters:
| Parameter | CLI flag | Type | Default | Description |
|---|---|---|---|---|
inputs | --input | array | required | Array of per-URL requests, each with at least url |
shared_inputs | --shared-inputs | JSON | — | Defaults applied to all items (render, format, driver, delivery) |
shared_inputs fields:
render, driver, formats, country, locale, parse, parserstorage_type, storage_url, storage_compress, storage_object_name, callback_urlCLI:
nimble extract-batch \
--shared-inputs 'render: true' --shared-inputs 'format: markdown' \
--input '{"url": "https://example.com/page-1"}' \
--input '{"url": "https://example.com/page-2"}' \
--input '{"url": "https://example.com/page-3"}'
Python SDK:
resp = nimble.extract_batch(
inputs=[
{"url": "https://example.com/page-1"},
{"url": "https://example.com/page-2"},
{"url": "https://example.com/page-3"},
],
shared_inputs={"render": True, "formats": ["markdown"]},
)
batch_id = resp["batch_id"]
Node SDK:
const resp = await nimble.extractBatch({
inputs: [
{ url: "https://example.com/page-1" },
{ url: "https://example.com/page-2" },
{ url: "https://example.com/page-3" },
],
sharedInputs: { render: true, formats: ["markdown"] },
});
const batchId = resp.batch_id;
Response:
{
"batch_id": "b7e1a2f3-...",
"batch_size": 3,
"tasks": [
{ "id": "task-001-uuid", "state": "pending", "batch_id": "b7e1a2f3-..." }
]
}
Polling: Use nimble batches progress --batch-id <batch_id> to check completion,
then nimble batches get --batch-id <batch_id> to get all task IDs, then
nimble tasks results --task-id <id> for each successful task.
See nimble-tasks reference for the full polling flow.
Delivery options:
callback_url in shared_inputs; Nimble POSTs on completionstorage_type + storage_url in shared_inputsmkdir -p .nimble
nimble --transform "data.markdown" extract --url "https://example.com/1" --format markdown > .nimble/1.md &
nimble --transform "data.markdown" extract --url "https://example.com/2" --format markdown > .nimble/2.md &
nimble --transform "data.markdown" extract --url "https://example.com/3" --format markdown > .nimble/3.md &
wait
| Field | Type | Description |
|---|---|---|
data.html | string | Extracted HTML content |
data.markdown | string | Content as markdown (if format=markdown) |
data.parsing | object | Structured data (if parse=True) |
data.network_capture | array | Captured network requests (if network_capture set) |
status_code | number | HTTP status code from target |
task_id | string | Unique request identifier |
metadata.driver | string | Driver used (e.g. vx6, vx10-pro) |
metadata.query_duration | number | Extraction time in ms |
Builds Databricks data products from live web data, end to end: discovers the right Nimble web-data agents, scrapes into Delta tables, and produces an AI/BI dashboard and/or a deployed Databricks App — a table → dashboard → app workflow, for production data products or quick demos. Use whenever a request pairs live or scraped web data WITH a Databricks destination — e.g. "scrape Amazon/Walmart prices into a Delta table and build a dashboard", "load Zillow/Instagram/Maps/search results into Databricks and build a dashboard or app", "showcase Nimble + Databricks to a prospect". Prefer it over nimble-web-expert or competitor-intel when the data lands in Databricks. Do NOT use for one-off web fetches or CSV exports with no Databricks destination — use nimble-web-expert instead. Do NOT use for competitor or company research briefings — use competitor-intel or company-deep-dive instead. Do NOT use for generic Databricks work with no Nimble/web-data angle — use the official databricks-* skills instead.
Use this skill ANY TIME the user asks about a specific company. Triggers: "tell me about [company]", "research [company]", "what does [company] do", "who is [company]", "look up [company]", "company deep dive", "due diligence on [company]", "background on [company]", "dig into [company]", "analyze [company]", or evaluating a company for investment, partnership, or sales. MUST be used instead of answering from memory — fetches real-time web data (funding, leadership changes, product launches, news) your training data lacks. Use even for well-known companies. Produces a sourced 360° report covering funding, leadership, product/tech, market position, news, and strategic outlook with dates and URLs. Do NOT use for multi-company competitor monitoring (use competitor-intel) or meeting prep with attendees (use meeting-prep).
Searches the live web via Nimble APIs to monitor competitors and produce a structured intelligence briefing. Runs parallel searches for news, product launches, hiring signals, and funding — then compares against previous findings to highlight only what's new. Use this skill when the user asks about competitors, competitive intelligence, or what rival companies are doing. Common triggers: "what are my competitors doing", "competitor update", "competitor news", "competitive landscape", "market intel", "what's new with [company]", "track [company]", "competitor briefing", "who's making moves", "competitive analysis", "losing deals to [company]", "battlecard". Also use before board meetings or strategy sessions when the user wants competitive context. Requires the Nimble CLI (nimble search, nimble extract) for live web data. Do NOT use for single-company deep dives (use company-deep-dive), meeting prep with attendees (use meeting-prep), or non-business queries.
Discovers all businesses of a given type in any geography using Nimble WSAs. Two modes: Discovery finds businesses from scratch; Audit compares a user's existing list (Google Sheet, CSV, inline) against fresh discovery, categorizing entries as matched, discovered-only, or reference-only. Vertical presets (Healthcare, SaaS, Restaurants, Legal, Auto/Home) auto-select WSA routing. Triggers: "find all X in Y", "build a list of", "market sizing", "account universe", "how many X in Y", "TAM for", "discover all", "audit my list", "compare against", "what am I missing", "gap analysis", "verify my business list", "prospect list". Do NOT use for competitor monitoring — use competitor-intel instead. Do NOT use for company deep dives — use company-deep-dive instead. Do NOT use for neighborhood-level exploration with social enrichment — use local-places instead.
Fills gaps in existing healthcare practitioner lists — adds missing phone numbers, credentials, specialties, contact info, education, reviews, and regulatory data. Triggers: "enrich my provider list", "fill in missing data", "add phone numbers to these doctors", "complete this practitioner database", "enrich CRM export", "fill gaps in my provider data", "supplement this healthcare list". Accepts CSV, Google Sheet URL, or pasted data. Searches for each provider's practice website, extracts missing fields, and enriches with reviews, clinical trials, and accreditation via WSAs. Do NOT use for extracting providers from practice URLs — use healthcare-providers-extract instead. Do NOT use for validating credentials — use healthcare-providers-verify instead. Do NOT use for discovering practices — use market-finder or local-places instead. Do NOT use for general extraction — use nimble-web-expert instead.
Extracts structured practitioner data from healthcare practice websites. Returns names, credentials, specialties, contact info, and education for every provider on a practice's site. Use when user asks to extract, pull, or list doctors, providers, or staff from practice websites. Triggers: "extract doctors from", "pull providers from", "who are the providers at", "build a provider database", "list all doctors at", "scrape the team page", "get practitioner data from". Accepts practice URLs (pasted, CSV, Google Sheet) or discovers practices via Google Maps when given specialty + location. Single sites or 100+ URLs. Do NOT use for filling data gaps — use healthcare-providers-enrich instead. Do NOT use for credential validation — use healthcare-providers-verify instead. Do NOT use for discovering practices — use market-finder or local-places instead. Do NOT use for general extraction — use nimble-web-expert instead.