| name | firecrawl |
| description | Scrape web pages to clean markdown using Firecrawl v2 — handles JS-heavy pages, site crawls, URL mapping, document parsing (PDF/DOCX/XLSX), LLM-powered extraction, autonomous agent scraping, and post-scrape browser interaction (Interact API). Prefer over WebFetch for quality and completeness. Triggers on scrape URL, fetch page, crawl site, extract content, parse document, web to markdown, DeepWiki, Firecrawl. |
Firecrawl & Jina Web Scraping
Firecrawl vs WebFetch
Prefer firecrawl scrape URL --only-main-content over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable.
firecrawl scrape https://docs.example.com/api --only-main-content
Token-Efficient Scraping
Inspired by Anthropic's dynamic filtering—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks.
The Principle: Search → Filter → Scrape → Filter → Reason
DO:
Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason
DON'T:
Search → Scrape everything → Reason over all of it
Step-by-Step Efficient Workflow
firecrawl search "query" --limit 20
firecrawl scrape URL1 --only-main-content | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py \
--sections "API,Authentication" --max-chars 5000
Post-Processing with filter_web_results.py
Pipe any Firecrawl or Exa output through this script to reduce context before reasoning:
firecrawl scrape URL --only-main-content | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "Pricing,Plans"
firecrawl search "query" --scrape --pretty | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --keywords "pricing,cost" --max-chars 5000
python3 ~/.claude/skills/exa-search/scripts/exa_search.py "query" --json | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --fields "title,url,text" --max-chars 3000
firecrawl scrape URL --only-main-content | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "API" --keywords "endpoint" --compact --stats
Full path: python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py
Flags: --sections, --keywords, --max-chars, --max-lines, --fields (JSON), --strip-links, --strip-images, --compact, --stats
Other Token-Saving Patterns
- Use
--only-main-content to strip navigation and footer boilerplate, reducing token consumption. Omit only when nav/footer content is specifically needed.
- Use
--only-clean-content (Python API script) for aggressive cleaning—strips nav, ads, and cookie banners. Stronger than --only-main-content; use when the page is still noisy after main-content filtering.
- Use
firecrawl map URL --search "topic" first to find relevant subpages before scraping
- Use
--format links first to get URL list, evaluate, then scrape selectively
- Use
--max-chars with exa_contents.py to cap extraction length
- Use
--formats summary (Python API script) over full text when you need the gist, not raw content
Claude API Native Tools (for API Agent Builders)
Anthropic's API now offers built-in dynamic filtering tools:
web_search_20260209 / web_fetch_20260209
Header: anthropic-beta: code-execution-web-tools-2026-02-09
These have built-in dynamic filtering via code execution. Use them when building Claude API agents directly. Use Firecrawl/Exa when you need: autonomous agents, batch scraping, structured extraction, domain-specific crawling, or when not on the Claude API.
Available Tools
1. Official Firecrawl CLI (firecrawl) — Primary
Setup: npm install -g firecrawl-cli && firecrawl login --api-key $FIRECRAWL_API_KEY
| Command | Purpose | Quick Example |
|---|
scrape | Single page → markdown | firecrawl scrape URL --only-main-content |
crawl | Entire site with progress | firecrawl crawl URL --wait --progress --limit 50 |
map | Discover all URLs on a site | firecrawl map URL --search "API" |
search | Web search (+ optional scrape) | firecrawl search "query" --limit 10 |
Full CLI reference: references/cli-reference.md
2. Auto-Save Alias (fc-save) — Shell Alias
Requires shell alias setup (not bundled with this skill).
fc-save URL
3. Python API Script (firecrawl_api.py) — Advanced Features
Command: python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py <command>
Requires: FIRECRAWL_API_KEY env var, pip install firecrawl-py requests
| Command | Purpose | Quick Example |
|---|
search | Web search with scraping | firecrawl_api.py search "query" -n 10 |
scrape | Single URL with page actions | firecrawl_api.py scrape URL --formats markdown summary |
batch-scrape | Multiple URLs concurrently | firecrawl_api.py batch-scrape URL1 URL2 URL3 |
crawl | Website crawling | firecrawl_api.py crawl URL --limit 20 |
map | URL discovery | firecrawl_api.py map URL --search "query" |
parse | Parse local documents (PDF, DOCX, XLSX) | firecrawl_api.py parse report.pdf |
extract | LLM-powered structured extraction | firecrawl_api.py extract URL --prompt "Find pricing" |
agent | Autonomous extraction (no URLs needed) | firecrawl_api.py agent "Find YC W24 AI startups" |
parallel-agent | Bulk agent queries (v2.8.0+) | firecrawl_api.py parallel-agent "Q1" "Q2" "Q3" |
interact | Post-scrape browser interaction | firecrawl_api.py interact SCRAPE_ID --prompt "Click pricing" |
interact-stop | Stop an interact session | firecrawl_api.py interact-stop SCRAPE_ID |
Agent models: spark-1-fast (10 credits, simple), spark-1-mini (default), spark-1-pro (thorough)
Full Python API reference: references/python-api-reference.md
4. DeepWiki — GitHub Repo Documentation
~/.claude/skills/firecrawl/scripts/deepwiki.sh <owner/repo> [section] [options]
AI-generated wiki for any public GitHub repo. No API key required.
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat
~/.claude/skills/firecrawl/scripts/deepwiki.sh langchain-ai/langchain --toc
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat 4.1-gpt-transformer-implementation
~/.claude/skills/firecrawl/scripts/deepwiki.sh openai/openai-python --all --save
5. Jina Reader (jina) — Fallback
Use when Firecrawl fails or for Twitter/X URLs (Firecrawl blocks Twitter, Jina works).
jina https://x.com/username/status/123456
Firecrawl vs Exa vs Native Claude Tools
| Need | Best Tool | Why |
|---|
| Single page → markdown | firecrawl scrape --only-main-content | Cleanest output |
| Search + scrape in one shot | firecrawl search --scrape | Combined operation |
| Crawl entire site | firecrawl crawl --wait --progress | Link following + progress |
| Local file → markdown | firecrawl_api.py parse FILE | Direct upload, no URL needed |
| Autonomous data finding | firecrawl_api.py agent | No URLs needed |
| Semantic/neural search | Exa exa_search.py | AI-powered relevance |
| Find research papers | Exa --category "research paper" | Academic index |
| Quick research answer | Exa exa_research.py | Citations + synthesis |
| Find similar pages | Exa exa_similar.py | Competitive analysis |
| Claude API agent building | Native web_search_20260209 | Built-in dynamic filtering |
| Twitter/X content | jina URL | Only tool that works |
| GitHub repo docs | deepwiki.sh owner/repo | AI-generated wiki |
| Anti-bot / Cloudflare bypass | scrapling stealth fetch | Local Turnstile solver |
| Element-level extraction | scrapling + CSS selectors | Precision targeting, adaptive tracking |
| No API key scraping | scrapling HTTP fetch | 100% local, no credentials |
| Site redesign resilience | scrapling adaptive mode | SQLite similarity matching |
| Budget JS-rendered scrape | cf_browser.py markdown URL | CF free tier: 10 min/day, $0.09/hr paid |
| Free static page fetch | cf_browser.py markdown URL --no-render | FREE during beta (no JS) |
| Budget multi-page crawl | cf_browser.py crawl URL | 5 free crawls/day, 100 pages each |
| Incremental re-crawl | cf_browser.py crawl --modified-since | Built-in, Firecrawl lacks this |
| Page screenshot/PDF | cf_browser.py screenshot/pdf URL | Built-in CF endpoints, cheaper |
| AI structured extraction | cf_browser.py json URL --prompt "..." | Workers AI included free |
Common Workflows
Single Page Scraping
firecrawl scrape https://example.com/page --only-main-content
Documentation Crawling
firecrawl map https://docs.example.com --search "API"
firecrawl crawl https://docs.example.com --include-paths /api,/guides --wait --progress
Research Workflow
firecrawl search "machine learning best practices 2026" --scrape --scrape-formats markdown
Document Parsing (Local Files)
Parse local documents into clean Markdown. Use parse for local or non-public files; use scrape for public URLs pointing to documents—both use the same Rust-based parser.
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py parse report.pdf
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py parse data.xlsx --only-main-content
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py parse contract.docx --zero-data-retention -o contract.md
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py parse invoice.pdf --json
Supported formats: PDF, DOCX, DOC, XLSX, XLS, HTML, HTM, ODT, RTF (up to 50 MB).
PDF Parsing (Fire-PDF v2.9)
Fire-PDF is now the default parsing pipeline for all PDF scrapes. Three modes:
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://example.com/report.pdf"
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape URL --pdf-mode fast
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape URL --pdf-mode ocr
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py parse report.pdf --pdf-mode auto --pdf-max-pages 20
| Flag | Values | Notes |
|---|
--pdf-mode | fast, auto, ocr | Default: auto. fast = text layer only; ocr = force OCR |
--pdf-max-pages | integer | Caps pages parsed; useful for budget control on large PDFs |
Both flags work on scrape (for PDF URLs) and parse (for local files).
Agent-Powered Research (No URLs Needed)
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py agent \
"Compare pricing tiers for Firecrawl, Apify, and ScrapingBee"
Interact Workflows (Post-Scrape Browser Interaction)
Scrape a page, then take actions on it—click buttons, fill forms, extract dynamic content. Two modes: AI prompts (natural language) and code execution (Node.js/Python/Bash).
When to Use Interact vs. Actions
| Need | Use | Why |
|---|
| Click/wait before a single scrape | --actions on scrape | Fire-and-forget, no session overhead |
| Multiple interactions with same page | interact | Persistent session, back-and-forth |
| Fill forms, log in, navigate | interact | Stateful, multi-step |
| Simple "wait for JS to load" | --actions with wait | Cheaper, no session |
Basic Interact (AI Prompt Mode)
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://example.com/pricing"
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
--prompt "Click the Enterprise pricing tab"
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
--prompt "What is the monthly price for the Enterprise plan?"
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact-stop SCRAPE_ID
Code Execution Mode (Cheaper)
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
--code "const text = await page.locator('.pricing-table').textContent(); console.log(text);"
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
--code "text = await page.locator('.content').text_content(); print(text)" \
--language python
Persistent Profile (Login Sessions)
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://app.example.com/login" \
--profile my-app --json
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py interact SCRAPE_ID \
--code "await page.fill('#email', 'user@example.com'); await page.fill('#password', 'pass'); await page.click('button[type=submit]');"
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py scrape "https://app.example.com/dashboard" \
--profile my-app
Important: Interact does NOT return page markdown. To get updated content after interaction, use code mode to extract specific elements, or issue a follow-up scrape.
Full interact reference: references/interact-reference.md
Billing Notes
- Credit/token unification (v2.9): Credits and tokens are now unified—15 tokens = 1 credit. All pricing is expressed in credits.
- Default cache TTL: Results are cached for 2 days. Use
--max-age 0 (or maxAge: 0 in API) to force a fresh scrape regardless of cache.
query format: Pass formats=["query"] (Python API) to get a direct answer (data.answer) instead of full markdown. Use for factual lookups where you don't need the full page content.
audio format: formats=["audio"] returns an MP3 of the page read aloud. Useful for accessibility pipelines or voice interfaces.
wikimedia engine: Pass engine="wikimedia" in search options to route queries through Wikimedia. Useful for encyclopedic lookups.
Troubleshooting
firecrawl --status && firecrawl credit-usage
firecrawl logout && firecrawl login --api-key $FIRECRAWL_API_KEY
echo $FIRECRAWL_API_KEY
- Scrape fails: Try
jina URL, or add --wait-for 3000 for JS-heavy sites
- Async job stuck: Check with
crawl-status/batch-status, cancel with crawl-cancel/batch-cancel
- Disable telemetry:
export FIRECRAWL_NO_TELEMETRY=1
Reference Documentation
| File | Contents |
|---|
references/cli-reference.md | Full CLI parameter reference (scrape, crawl, map, search, fc-save, jina, deepwiki) |
references/python-api-reference.md | Full Python API script reference (all commands, SDK examples) |
references/firecrawl-api.md | Firecrawl Search API reference |
references/firecrawl-agent-api.md | Agent API (spark models, parallel agents, webhooks) |
references/actions-reference.md | Page actions for dynamic content (click, write, wait, scroll) |
references/interact-reference.md | Interact API: post-scrape browser interaction (prompt, code, profiles) |
references/branding-format.md | Brand identity extraction (colors, fonts, UI) |
Test Suite
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --quick
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --test scrape