| name | expert-research |
| description | Expert-level research (L3) — runs /deep-research (L2) then adds critic agent, fact-checking pass, multi-perspective search, and human-in-the-loop plan approval. ~20 min, 40-60 sources, 3000+ word report with executive summary. Use for strategic decisions, technology migrations, important investigations. |
| user_invocable | true |
Expert Research — L3
Expert research tier. Composes on top of L2 (/deep-research) by adding:
- Human-in-the-loop plan approval before expensive work
- Multi-perspective search (3 angles: proponents, critics, neutral)
- Fact-check pass — critical claims verified against 2+ independent sources
- Critic agent — adversarial review of the entire report
- 2 reflection loops (vs L2's 1)
- Structured decision matrix when the query implies a choice
Position in ladder: L3. Calls L2 as its foundation (which itself calls L1). Called by L4, L5.
When to Use
- Strategic decisions: migration, architecture choice, big-ticket tooling
- "стоит ли мигрировать X → Y"
- "какой стек выбрать для нового проекта"
- Investigations where being wrong costs real money/time
- When user says "критически", "стратегически", "серьёзно и надёжно"
- Default when the decision stakes are high
Escalate to:
/academic-research (L4) — when you need scientific papers and academic rigor
/ultra-research (L5) — when you want a full knowledge base and playbooks
Budget
- Time: ~20-25 min total (L1: ~5 + L2: ~7 + L3: ~8-13)
- Tavily credits: ~150
- Subagent tokens: ~50-80K (critic agent via Agent tool) — added in v0.2.2 budget doc
- Codex credits (if available): ~200-300K tokens across 2 parallel calls (neutral + critic)
- Sub-questions: 8 (L1: 3 + L2: 2 + L3: 3)
- Sources: 40–60
- Output: 2000-3000 word report + executive summary + PDF export (target relaxed in v0.2.2 based on real-session testing; prior 3000-4500 was unrealistic for single-session execution without heavy subagent delegation)
Pipeline
┌────────────────────────────────────────┐
│ STAGE 0: Plan Approval (optional) │
│ — show user the expected L3 plan │
│ — wait for y/n or modifications │
└──────────────────┬─────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ STAGE 1: Execute /deep-research (L2) │
│ — produces L1/ + L2/ artifacts │
└──────────────────┬─────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ STAGE 2: L3 Expert Layer │
│ 1. MULTI-PERSPECTIVE PLAN │
│ 2. 3-ANGLE SEARCH │
│ 3. READ + SUMMARIZE │
│ 4. FACT-CHECK CRITICAL CLAIMS │
│ 5. CRITIC AGENT (adversarial review) │
│ 6. SECOND REFLECTION LOOP │
│ 7. DECISION MATRIX (if applicable) │
│ 8. EXPERT SYNTHESIS │
│ 9. PDF EXPORT │
└────────────────────────────────────────┘
Artifacts Directory
.firecrawl/research/<slug>/
├── L1/ # from /research
├── L2/ # from /deep-research
└── L3/ # this skill
├── plan-approved.md # user-approved plan
├── perspective-plan.md # 3 angles
├── sources/
│ ├── 31-<slug>.md # continuing numbering
│ └── ...
├── fact-check.md # critical claims + verification
├── critic-report.md # adversarial review findings
├── decision-matrix.md # if query implies choice
├── report.md # final L3 report (supersedes L2/report.md)
├── executive-summary.md # 500-word TL;DR for stakeholders
├── bibliography.md
└── report.pdf # exported via pandoc
Stage 0: Plan Approval
Before running L3 (which costs ~150 credits and 20 min), show the user a plan preview:
# L3 Expert Research Plan
**Query:** <user's query>
**Estimated cost:** ~150 Tavily credits, ~20 min
**Will produce:** 3000+ word report, executive summary, fact-check report, PDF
## Approach
1. Run L2 baseline (planner + reflection + contradiction detection)
2. Add multi-perspective search: proponents, critics, neutral
3. Fact-check top 5 critical claims against 2+ independent sources
4. Run critic agent adversarial review
5. Build decision matrix (if query implies a choice)
## Initial sub-questions (will be refined after L1/L2)
- <preliminary subq1>
- <preliminary subq2>
- ...
**Proceed?** (y / n / modify)
If user says y — proceed. If n — abort gracefully. If modify — adjust and re-show.
Skip approval if:
- User explicitly invoked with
/expert-research --go or said "сразу запускай"
- Called programmatically from L4 or L5
Stage 1: Execute L2
Invoke the deep-research skill:
Skill: deep-research
Args: <query>
Wait for completion. Verify artifacts:
.firecrawl/research/<slug>/L2/report.md
.firecrawl/research/<slug>/L2/contradictions.md
.firecrawl/research/<slug>/L2/confidence.md
Stage 2: L3 Expert Layer
Step 2.1: MULTI-PERSPECTIVE PLAN
Read L2 report. Generate 3 perspectives on the topic:
- Proponent angle — what do advocates/vendors say?
- Critic angle — what do detractors say? known problems?
- Neutral/academic angle — what do benchmarks, studies, independent reviewers say?
Write L3/perspective-plan.md:
# 3-Angle Research Plan
## Angle 1: Proponents
**Goal:** understand the strongest case for X
**Queries:**
- "why X is better than alternatives"
- "X official documentation advantages"
- "X success stories 2026"
## Angle 2: Critics
**Goal:** surface the strongest case against X
**Queries:**
- "problems with X in production"
- "why we moved away from X"
- "X limitations 2026"
## Angle 3: Neutral
**Goal:** independent data
**Queries:**
- "X vs Y benchmark 2026"
- "X real-world performance data"
- "X adoption metrics"
Step 2.2: 3-ANGLE SEARCH
Run 9 searches in parallel (3 queries × 3 angles) using Firecrawl + Tavily.
v0.5.0: If Exa MCP is installed (check for mcp__exa__* tools availability), run additional Exa searches in parallel. Exa's neural ranking excels at the critic and neutral angles — it finds conceptually-related dissenting views that keyword search misses.
# Neutral angle via Exa with category filter
mcp__exa__web_search_advanced_exa with:
query: <original query, framed neutrally>
category: "research paper" # or omit for general news/web
type: "auto"
num_results: 10
contents: { text: { max_characters: 20000 } }
Save to .firecrawl/research/$SLUG/L3/exa-neutral.json. For research paper category, this is the single best source of independent academic content we have. Fall back to Tavily if Exa unavailable.
v0.2.2: persist Tavily results. After each mcp__tavily__tavily_search call, write the response JSON to .firecrawl/research/$SLUG/L3/tavily-<angle>-<n>.json (angle = proponent/critic/neutral, n = query index) using the Write tool. MCP responses live in conversation context only; disk persistence makes them auditable and survivable across compaction.
Critical: use different search tactics per angle:
- Proponent angle: official docs, vendor blogs
- Critic angle: HN discussions, "we moved from X" blog posts, issue trackers
- Neutral angle: benchmark sites, independent reviewers, academic indexes
Step 2.3: READ + SUMMARIZE
Pick top 10–15 NEW sources across 3 angles. Scrape + summarize using same .sum.md format as L1/L2.
Add metadata to summaries:
**Angle:** [proponent | critic | neutral]
**Bias estimate:** [low | medium | high]
Save to L3/sources/ starting from the next available number.
Step 2.4: FACT-CHECK CRITICAL CLAIMS
v0.6.0+: use Perplexity for fact-checks. Perplexity is purpose-built for citation-grounded Q&A — cheaper and more focused than invoking Codex for verification of 5 short claims.
# For each of the top 5 critical claims:
mcp__perplexity-ask__perplexity_ask with:
messages: [{role: "user", content: "Verify this claim and find independent sources that support or dispute it: <claim>. Return: verdict (CONFIRMED/DISPUTED/UNVERIFIED), 2-3 supporting/disputing sources with URLs, one-sentence rationale."}]
Save each response to .firecrawl/research/$SLUG/L3/perplexity-factcheck-<n>.json. Combine Perplexity's verdicts with the manual fact-check process below:
- If Perplexity says CONFIRMED + our manual check confirms → High confidence
- If Perplexity says DISPUTED + our manual check flagged → include caveat in report
- Disagreements between Perplexity and manual → re-run with Codex as tiebreaker
If Perplexity unavailable, skill proceeds with manual fact-check only (existing pre-v0.6 behavior).
Read L2/report.md + L3 new summaries. Identify the top 5 most critical claims — ones that would change the recommendation if wrong.
For each, verify against 2+ independent sources (not from the same vendor/author):
Write L3/fact-check.md:
# Fact-check report
## Claim 1: "X is 3x faster than Y"
**Source in report:** [7]
**Verification attempts:**
- ✓ Confirmed: [23] independent benchmark shows 2.8x
- ✓ Confirmed: [25] community benchmark shows 3.1x
- **Verdict:** CONFIRMED (High confidence)
## Claim 2: "X has poor ecosystem"
**Source in report:** [12] (critic blog)
**Verification attempts:**
- ✗ Contradicted: [28] shows 500+ packages
- ? Inconclusive: [30] mentions "some gaps" without specifics
- **Verdict:** DISPUTED — downgrade to "community opinion, not fact"
## Claim 3: ...
Verdict categories:
- CONFIRMED — 2+ independent sources agree
- DISPUTED — sources disagree, needs careful framing
- UNVERIFIED — couldn't find independent confirmation (downgrade in report)
Step 2.4a: CODEX CROSS-MODEL CHANNEL (optional, added v0.2)
Fault-tolerant — if Codex is unavailable, skill continues single-model.
Spawn two parallel Codex calls for independent second-model perspective: a neutral-angle research pass and a cross-model critic pass. These run in parallel with Step 2.5 (Claude critic agent).
CODEX_HELPER="$HOME/.claude/scripts/codex-research.sh"
[ -x "$CODEX_HELPER" ] || CODEX_HELPER="scripts/codex-research.sh"
if [ -x "$CODEX_HELPER" ]; then
bash "$CODEX_HELPER" 360 \
".firecrawl/research/$SLUG/L3/codex-neutral.md" \
"You are an independent research assistant. Research the query '<ORIGINAL QUERY>' from a skeptical, neutral angle — ignore vendor marketing and hype. Look for benchmarks, independent reviewers, community experiences, and failures. Return 8-12 key findings with source URLs. Include dates. Be concise (≤1000 words)." &
CODEX_NEUTRAL_PID=$!
L2_REPORT=$(cat ".firecrawl/research/$SLUG/L2/report.md" 2>/dev/null | head -300)
bash "$CODEX_HELPER" 360 \
".firecrawl/research/$SLUG/L3/codex-critic.md" \
"You are a skeptical research critic. Review this research report adversarially. Challenge main conclusions. Find what's missing, wrong, or oversimplified. Use your own web search to verify or refute key claims. Return a critic report (≤1000 words).
REPORT TO REVIEW:
$L2_REPORT" &
CODEX_CRITIC_PID=$!
else
CODEX_NEUTRAL_PID=""
CODEX_CRITIC_PID=""
fi
Proceed to Step 2.5 (Claude critic) while these run. After Step 2.5 finishes:
[ -n "$CODEX_NEUTRAL_PID" ] && wait "$CODEX_NEUTRAL_PID" 2>/dev/null
[ -n "$CODEX_CRITIC_PID" ] && wait "$CODEX_CRITIC_PID" 2>/dev/null
for f in codex-neutral codex-critic; do
STATUS_FILE=".firecrawl/research/$SLUG/L3/${f}.md.status"
[ -f "$STATUS_FILE" ] && echo "$f: $(cat "$STATUS_FILE")"
done
When writing the final L3 synthesis (Step 2.8), merge Codex findings with Claude findings:
- If
L3/codex-critic.md exists — include its objections in the report's "Weaknesses / counterpoints" section
- If
L3/codex-neutral.md exists — cross-check its facts against Claude's conclusions, flag any disagreements in contradictions.md
- If Codex outputs absent (check
.status files) — note in Confidence section: Cross-model verification: unavailable (<reason>). Report is single-model.
Step 2.5: CRITIC AGENT (adversarial review)
Invoke a critic sub-agent to attack the L2 report. Use the Agent tool with a dedicated prompt:
Agent(subagent_type="general-purpose",
description="Critic review of L2 research",
prompt="""
You are a skeptical research critic. Your job is to find problems with this research report.
Read: .firecrawl/research/<slug>/L2/report.md
Also read: .firecrawl/research/<slug>/L2/confidence.md
For each major claim, ask:
1. Is the evidence strong enough?
2. Are there alternative explanations?
3. What would change my mind?
4. What's missing?
5. What's the weakest part of the argument?
Also check:
- Is the recommendation justified by the evidence?
- Are there hidden assumptions?
- Is there selection bias in the sources?
- Are there more recent sources that contradict the conclusion?
Write findings to: .firecrawl/research/<slug>/L3/critic-report.md
Format:
# Critic Report
## Weakest claims
- Claim X [citation]: problem with it
- ...
## Hidden assumptions
- ...
## Missing considerations
- ...
## Recommendation validity
- Justified? partial? unjustified?
## Questions the report doesn't answer but should
- ...
Be harsh but fair. Cite specific parts of the report you're attacking.
""")
Wait for critic to complete. Read critic-report.md.
Step 2.6: SECOND REFLECTION LOOP
Based on critic findings, identify new gaps:
- Claims flagged as weak → need more sources
- Missing considerations → need targeted searches
- Hidden assumptions → need to verify or acknowledge
Run 3–5 more targeted searches + scrapes to address critic concerns. Add to L3/sources/.
Step 2.7: DECISION MATRIX (if applicable)
Skip this step if query is pure information (no choice implied).
Apply this step if query implies choosing between alternatives ("X vs Y", "should we migrate", "best tool for Z").
Write L3/decision-matrix.md:
# Decision Matrix
## Criteria (weighted)
| Criterion | Weight | Why it matters for user context |
|---|---|---|
| Performance | 20% | ... |
| DX | 15% | ... |
| Maturity | 15% | ... |
| Cost | 10% | ... |
| Community | 15% | ... |
| Migration effort | 25% | ... |
## Options scored
| Option | Perf | DX | Maturity | Cost | Community | Migration | **Total** |
|---|---|---|---|---|---|---|---|
| A | 8 | 7 | 9 | 6 | 8 | 5 | **7.15** |
| B | 9 | 6 | 6 | 8 | 6 | 9 | **7.25** |
| C | 6 | 9 | 8 | 7 | 9 | 4 | **7.05** |
## Winner
**B** — by a small margin (7.25 vs 7.15)
## Why B edges out A
[Detailed reasoning with citations]
## When to pick A instead
[Scenarios where the decision flips]
## When to pick C instead
[Scenarios]
Step 2.8: EXPERT SYNTHESIS
Produce L3/report.md — supersedes L2/report.md.
Structure:
# <Topic Title>
**Query:** <original>
**Level:** L3 (includes L1, L2)
**Sources:** <total count>
**Generated:** <date>
## Executive Summary
[400-500 words: the complete answer for someone who will read only this section]
## Context
[Why this question matters, what's at stake, user's context if known]
## Findings by Sub-question
[All L1/L2 subqs + L2 followups, enriched]
## Multi-perspective analysis
### Proponent view
[Strongest case for, with citations and confidence]
### Critic view
[Strongest case against, with citations]
### Neutral/independent data
[Benchmarks, studies, adoption data]
### Synthesis
[How to reconcile the three perspectives]
## Fact-check results
[From fact-check.md: confirmed / disputed / unverified claims]
## Addressing critic concerns
[From critic-report.md: weaknesses identified and how resolved or acknowledged]
## Decision Matrix
[If applicable — full matrix from decision-matrix.md]
## Recommendation
[Concrete, opinionated, with explicit reasoning chain]
## Risks and caveats
[What could go wrong with the recommendation]
## Next steps
[Actionable — what the user should do next]
## Bibliography
[See L3/bibliography.md for full list]
Target length: 2000–3000 words (v0.2.2 relaxed from 3000-4500 based on real-session testing — old target was unachievable without aggressive subagent delegation and caused truncation artifacts).
Step 2.9: Executive Summary
Write separate L3/executive-summary.md — 500 words max. For stakeholders who won't read the full report. Plain language, no jargon.
🛑 L3 FINAL CHECKPOINT (added v0.2.2, shared-lib'd v0.4.0)
Before delivering the report, run this verification. Mirrors the L2 CHECKPOINT discipline — L3 was previously only prose-level verified, which was a regression from L2.
SLUG="<slug>"
VERIFY_LIB="$HOME/.claude/scripts/lib/verify-research.sh"
[ -f "$VERIFY_LIB" ] || VERIFY_LIB="scripts/lib/verify-research.sh"
[ -f "$VERIFY_LIB" ] || { echo "❌ verify-research.sh not found — run scripts/install.sh"; exit 1; }
source "$VERIFY_LIB"
verify_l3 "$SLUG" || exit 1
The function checks: report ≥1700 words (target 2000-3000); executive summary, critic report (subagent output), fact-check, bibliography, perspective plan all present; ≥8 L3 source summaries; every [N] citation (including multi-cite formats) maps to a bibliography entry.
Only proceed to PDF export (Step 2.10) after this prints ✅ L3 FINAL CHECKPOINT PASSED.
Step 2.10: PDF Export
cd .firecrawl/research/<slug>/L3/
pandoc report.md -o report.pdf --pdf-engine=xelatex -V geometry:margin=1in 2>/dev/null || \
pandoc report.md -o report.html && \
echo "PDF export requires pandoc+xelatex. HTML fallback at report.html"
If pandoc isn't installed, fall back to HTML export.
Final Output
- Display
L3/executive-summary.md in chat (short version)
- Offer to show full
report.md if user wants
- Show stats:
📊 L3 stats: sources, perspectives, fact-checked claims, critic report: concerns addressed
- Artifact locations:
📁 .firecrawl/research/<slug>/L3/
📄 report.md, executive-summary.md, report.pdf, fact-check.md, critic-report.md, decision-matrix.md
- Escalation:
Нужны научные источники? /academic-research (L4) добавит arXiv, Scholar и полный мультиагент.
Нужна полная база знаний? /ultra-research (L5) построит vault с playbook'ами.
Rules
- Plan approval is the norm — skip only if user explicitly said "сразу запускай" or called from L4/L5
- Always call L2 first — never inline L2 logic
- Multi-perspective is mandatory — don't rely on one angle
- Critic is a separate agent — use Agent tool, not internal reasoning
- Fact-check top 5 only — don't try to verify every claim
- Decision matrix only when applicable — don't force it on info queries
- Executive summary is separate — ~500 words, standalone, no jargon
- Preserve L1/L2 artifacts — never modify or delete
Called from higher levels
L4/L5 call this skill to get expert-grade foundation. When called:
- Run full L3 pipeline (including plan approval if user is present)
- Higher levels read
L3/report.md, L3/critic-report.md, L3/decision-matrix.md