| name | web-research |
| version | 1.0.0 |
| description | Web research using Tavily and Firecrawl. Searches the web, reads pages, crawls sites, and synthesizes findings. Use when: (1) user says "web research", "research this", "look into", "find out about", "what's the latest on", "/web-research", (2) user needs information from the web to inform a decision, strategy, or content, (3) user provides a URL and wants it read/analyzed, (4) user wants competitive intel, market research, or topic deep-dives. NOT for: X/Twitter research, Reddit research, scraping for data pipelines, or browser automation unrelated to research.
|
| allowed-tools | ["mcp__tavily__tavily_search","mcp__tavily__tavily_extract","mcp__tavily__tavily_crawl","mcp__tavily__tavily_map","mcp__tavily__tavily_research","Bash(firecrawl *)","Bash(npx firecrawl *)"] |
Web Research
Search the web, read pages, and synthesize findings using Tavily (search/research) and Firecrawl (scrape/crawl).
Tool Routing
Pick the right tool for the job:
| Need | Tool | Why |
|---|
| Quick search / discovery | tavily_search | Fast, returns snippets inline ā good for scanning |
| Deep multi-source research | tavily_research | Autonomous research agent, handles broad questions |
| Read a specific URL | firecrawl scrape | Clean markdown output, handles JS rendering |
| Read 1-3 URLs quickly | tavily_extract | Inline results, no file management |
| Read protected / JS-heavy page | firecrawl browser | Cloud Chromium for interactive pages |
| Bulk extract a site section | firecrawl crawl or download | Handles scale, file-based output |
| Find pages within a site | firecrawl map | URL discovery with search filtering |
| Date-filtered search | tavily_search with time_range or start_date/end_date | Built-in temporal filtering |
| Domain-scoped search | tavily_search with include_domains | Restricts to specific sites |
When Tavily vs Firecrawl
Default to Tavily for searching and quick reads. It's faster and returns results in-context.
Escalate to Firecrawl when:
- You need full page content as clean markdown (not just snippets)
- The page is large and results should go to a file
- Content requires browser interaction (clicks, scrolls, form fills)
- You need to crawl multiple pages from one site
- Tavily extraction fails on a protected or complex page
Use both together for research workflows:
- Tavily search to discover relevant URLs
- Firecrawl scrape to deep-read the best sources
Workflow
Standard Research Flow
1. SEARCH ā Tavily search to discover sources and get initial snippets
2. READ ā Firecrawl scrape (or Tavily extract) the most relevant pages
3. FOLLOW-UP ā Search again with refined queries based on what you learned
4. SYNTHESIZE ā Summarize findings with source attribution
Quick Question (simple factual lookup)
tavily_search("query", search_depth="basic", max_results=5)
ā Answer from snippets, cite sources
Topic Deep-Dive (broad or complex question)
tavily_research(input="detailed research question", model="auto")
ā Returns comprehensive synthesis from multiple sources
Use model="mini" for narrow questions with few subtopics, model="pro" for broad questions.
Competitive / Site Analysis
firecrawl map "https://competitor.com" --search "pricing" ā find relevant pages
firecrawl scrape "https://competitor.com/pricing" ā extract content
tavily_search("competitor name vs alternatives") ā market context
Reading a Specific URL
If the user gives you a URL:
- Short page / quick read:
tavily_extract with the URL
- Long page / need full content:
firecrawl scrape "<url>" -o .firecrawl/page.md
- Page behind interactions:
firecrawl browser "open <url>" ā snapshot ā scrape
Tavily Search Tips
# Basic search
tavily_search(query="your question", max_results=5)
# Recent results only
tavily_search(query="topic", time_range="week")
# Specific date range
tavily_search(query="topic", start_date="2025-01-01", end_date="2025-02-01")
# Search specific sites
tavily_search(query="topic", include_domains=["example.com", "docs.example.com"])
# Exclude sites
tavily_search(query="topic", exclude_domains=["reddit.com", "pinterest.com"])
# Thorough search (slower but better)
tavily_search(query="topic", search_depth="advanced", max_results=10)
# Get full page content with results (alternative to scraping)
tavily_search(query="topic", include_raw_content=true)
Firecrawl Output
Write large results to .firecrawl/ with descriptive filenames:
firecrawl scrape "<url>" -o .firecrawl/competitor-pricing.md
firecrawl search "query" --scrape -o .firecrawl/search-topic.json --json
Never read entire output files at once. Use incremental reads:
wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md
Synthesis
After gathering sources, synthesize findings for the user:
- Lead with the answer ā what did you find?
- Cite sources ā include URLs for key claims
- Flag conflicts ā note when sources disagree
- Note gaps ā what couldn't you find or verify?
- Suggest follow-ups ā what else could be explored?
Keep synthesis concise. The user wants insights, not a dump of everything you read.
Parallel Research
When a research question has multiple independent angles, run searches in parallel:
# These can run simultaneously:
tavily_search("competitor A pricing") # angle 1
tavily_search("competitor B pricing") # angle 2
tavily_search("industry pricing trends") # angle 3
For Firecrawl, run parallel scrapes:
firecrawl scrape "<url-1>" -o .firecrawl/1.md &
firecrawl scrape "<url-2>" -o .firecrawl/2.md &
wait