| name | literature-search |
| description | Searches and discovers academic papers across multiple sources (Semantic Scholar, arXiv, Tavily, Exa, Gemini deep research, AMiner, Google Scholar) with adaptive engine selection based on query type. Returns ranked, deduplicated results with metadata (authors, venue, year, citations, abstract, PDF link). Use when the user asks to find papers / literature / publications / preprints / references on a topic, search for related work, look up a specific paper by title or DOI or arXiv ID, find papers by an author, find recent SOTA / state-of-the-art work, survey a research area, or run a deep / comprehensive literature search with synthesis.
|
Literature Search
Systematic, multi-engine academic paper search.
When to Use
- User asks "find papers about X"
- User needs related work for a new project
- User wants to know the state of the art on a topic
- User asks for papers from a specific venue/author/year
Engine Selection
Choose engines based on the search goal:
| Goal | Primary Engine | Supplementary |
|---|
| Broad topic survey | Semantic Scholar | arXiv, Tavily |
| Latest preprints | arXiv (sort by submittedDate) | Semantic Scholar |
| Deep research / complex questions | Gemini deep research | Tavily + Exa |
| Specific paper by title | Semantic Scholar | Google Scholar (via Tavily) |
| Papers by author | Semantic Scholar (author search) | AMiner |
| Chinese research community | AMiner | Semantic Scholar |
| Industry/applied papers | Tavily (deep) | Exa semantic search |
| Social buzz / trending papers | Twitter/X (xreach) | Reddit |
| Code implementations | GitHub (gh search) | Exa (get_code_context) |
| Finding similar papers | Exa (semantic) | Semantic Scholar (citations) |
Workflow
Step 1: Understand the Query
Before searching, clarify:
- Scope: Broad survey vs. specific subtopic
- Recency: All time vs. last N years vs. latest only
- Venue preference: Top-tier only? Specific conference?
- Quantity: Top 5 vs. comprehensive survey
- Depth: Quick list vs. deep research with synthesis
Step 2: Select Search Strategy
Quick search (single engine):
For simple, well-defined queries. Use Semantic Scholar or arXiv directly.
Multi-engine search (2-3 engines in parallel):
For broader topics. Run engines simultaneously, deduplicate results.
Deep research (Gemini):
For complex, multi-faceted research questions. Gemini deep research mode synthesizes across many sources and provides a structured analysis with citations. Use this when:
- The question spans multiple subfields
- You need synthesis, not just a list of papers
- The user explicitly asks for "deep research" or "comprehensive survey"
Step 3: Execute Search
node scripts/search/semantic-scholar.mjs "query" -n 20
node scripts/search/arxiv.mjs "query" -n 15 --sort submittedDate --cat cs.CL
node scripts/search/search.mjs "query site:arxiv.org OR site:aclanthology.org" -n 10
node scripts/search/search.mjs "query" --deep
mcporter call 'exa.web_search_exa(query: "query", numResults: 10)'
Step 4: Deduplicate & Rank
Merge results across engines:
- Deduplicate by title similarity (fuzzy match, >90% = same paper)
- Rank by: citation count × recency × venue tier × relevance
- Flag if a paper appears in multiple engines (higher confidence)
Step 5: Present Results
Format as a ranked list with key metadata:
1. **[Title]** (Venue Year, Citations: N)
Authors: [First author] et al.
TL;DR: [1 sentence]
Why relevant: [connection to user's query]
2. ...
Step 6: Deep Dive (Optional)
If user wants to go deeper on any paper:
- Switch to paper-reading skill
- Or add to Zotero reading queue (use zotero-management skill)
Search Tips
- Use specific terminology: "multi-agent reinforcement learning" > "MARL" > "agents working together"
- Combine with venue filter: Adding
venue:ACL or category cs.CL dramatically improves precision
- Check citation chains: A highly-cited paper's references and citers are often gold
- Cross-lingual: For Chinese papers, try both English and Chinese queries
- Date filter for SOTA: Use
--sort submittedDate on arXiv to find the latest approaches
- Gemini for synthesis: When you need to understand a field (not just list papers), use Gemini deep research to get a narrative overview first, then drill into specific papers
Quality Signals
When ranking, weight these signals:
- Citation count: High for established work, less meaningful for papers < 6 months old
- Venue tier: ACL/EMNLP/NeurIPS/ICML > workshops > arXiv-only
- Author reputation: Check if senior authors are established in the field
- Code availability: Papers with code are more verifiable
- Reproducibility: Clear methodology sections and experimental details