| name | searching |
| description | Use when building search queries for systematic literature review using scimesh.
TRIGGERS: slr search, build query, calibrate, search papers, executar busca, construir query, calibrar busca
|
Searching
Interactive query construction and search execution for systematic literature review.
Overview
Build search queries collaboratively with the user. Never search before showing the query. Always use workspace search.
Core principle: Query construction is COLLABORATIVE. You build the query WITH the user, not FOR them.
Prerequisite: Protocol must be defined first (scimesh:protocoling).
Iron Rules
- NO autonomous query building - Query construction is INTERACTIVE; user approves each component
- Use workspace search - Always use
scimesh workspace search for SLR
- ALWAYS show query in Scala code block - User must see the FULL query before any search
- READ THE PROTOCOL - Before building query, read
index.yaml and extract ALL relevant information for the query (year range, citation thresholds, framework fields, inclusion/exclusion criteria that can be translated to query filters)
Query Visibility
Problem: Bash tool output truncates long queries.
Solution:
- NEVER search before showing the query to user
- Generate 3-4 query VARIATIONS first, then ask user to choose
- Always display final query in a Scala code block:
TITLE-ABS("imputation" AND "tabular")
AND TITLE-ABS("deep learning" OR "neural network" OR "autoencoder")
AND PUBYEAR > 2020
AND CITEDBY >= 10
Scopus-style Syntax (used by scimesh)
| Field | Syntax | Example |
|---|
| Title | TITLE(x) | TITLE(transformer) |
| Abstract | ABS(x) | ABS("machine learning") |
| Title+Abstract | TITLE-ABS(x) | TITLE-ABS(RLHF) |
| Title+Abstract+Keywords | TITLE-ABS-KEY(x) | TITLE-ABS-KEY(deep learning) |
| Author | AUTHOR(x) | AUTHOR(Vaswani) |
| Year | PUBYEAR > 2020 | PUBYEAR > 2020 AND PUBYEAR < 2025 |
| Citations | CITEDBY >= 100 | CITEDBY >= 50 |
Operators: AND, OR, AND NOT, ()
Step 0: Read Protocol
BEFORE asking any questions, read {workspace_path}/index.yaml using the Read tool.
Extract ALL information relevant to the query: year range, citation thresholds, framework fields (framework.type and framework.fields), and any criteria from inclusion/exclusion that can be translated to query filters.
Step 1: Identify Key Concepts
Ask user to identify main concepts:
{
"question": "What are the MAIN concepts in your research question? (select all)",
"header": "Concepts",
"options": [
{"label": "Suggest concept 1", "description": "Based on research question"},
{"label": "Suggest concept 2", "description": "Based on research question"},
{"label": "Suggest concept 3", "description": "Based on research question"}
],
"multiSelect": True
}
Step 2: Expand Synonyms
For EACH concept, ask about synonyms:
{
"question": f"For concept '{concept}', which synonyms should we include?",
"header": "Synonyms",
"options": [
{"label": f"'{synonym1}' (Recommended)", "description": "Common alternative"},
{"label": f"'{synonym2}'", "description": "Related term"},
{"label": f"'{synonym3}'", "description": "Technical variant"}
],
"multiSelect": True
}
Step 3: Generate Query Variations
Present 3-4 query strategies:
{
"question": "Choose query strategy:",
"header": "Query",
"options": [
{"label": "Focused (Rec)", "description": "TITLE-ABS(X AND Y) - both terms required"},
{"label": "Broad", "description": "TITLE-ABS(X) AND TITLE-ABS(Y) - separate clauses"},
{"label": "Title-only", "description": "TITLE(X AND Y) - highest precision"}
],
"multiSelect": False
}
Then show the FULL query for chosen strategy in Scala block.
Step 4: Calibrate
Use scimesh search freely for calibration - it has NO side effects:
uvx scimesh search "QUERY" -p openalex -n 10 -f json | jq '.papers | length'
scimesh search is READ-ONLY (as long as you don't provide a custom -o):
- Does NOT create any files or directories
- Does NOT modify the workspace
- Can be run multiple times to refine the query
- Use it to test different query variations before committing
Then ask:
{
"question": f"Query returns ~{count} papers. Proceed or adjust?",
"header": "Calibrate",
"options": [
{"label": "Good, proceed", "description": f"{count} is within target"},
{"label": "Too many, add filters", "description": "Increase citations or narrow terms"},
{"label": "Too few, broaden", "description": "Remove constraints or add synonyms"}
],
"multiSelect": False
}
Step 5: Connectivity Check
BEFORE executing search, ask about connectivity and Sci-Hub:
{
"questions": [
{
"question": "Are you connected to institutional VPN?",
"header": "VPN",
"options": [
{"label": "Yes, VPN active", "description": "Better access + Scopus enabled"},
{"label": "No VPN", "description": "Will use Open Access only"},
{"label": "Wait, connecting...", "description": "Pause for VPN connection"}
],
"multiSelect": False
},
{
"question": "Enable Sci-Hub for PDF downloads?",
"header": "Sci-Hub",
"options": [
{"label": "No (Rec)", "description": "Only legal Open Access sources"},
{"label": "Yes", "description": "Use --scihub flag (at your own risk)"}
],
"multiSelect": False
}
]
}
- If user chooses "Wait", pause and ask again when ready
- If user enables Sci-Hub, add
--scihub flag to workspace search command
Step 6: Execute Search
CRITICAL: scimesh workspace search creates workspace structure and downloads papers.
Before executing, you MUST:
- Show the COMPLETE query to the user (in Scala code block)
- Get EXPLICIT confirmation to proceed
- Never execute
workspace search without user seeing the full query first
After final confirmation, use the workspace search command:
uvx scimesh workspace search {review_path}/ "FINAL QUERY" \
-p arxiv,openalex,semantic_scholar \
-n 200
Note: {review_path} is the workspace directory created by workspace init --type slr (e.g., ./reviews/my-review/)
Workspace search behavior:
- Requires existing workspace with protocol (run
workspace init --type slr first)
- Uses workspace databases by default if
-p not specified
- Deduplicates against existing papers in
papers.yaml
- Records search in
log.yaml (query, providers, results count)
- Papers track which searches found them via
search_ids
- Downloads PDFs when available (Open Access)
- Auto-updates workspace stats
Incremental Search
Add more papers to existing workspace with additional queries:
uvx scimesh workspace search {review_path}/ "NEW QUERY" \
-p openalex \
-n 50
Providers Reference
| Provider | Strengths |
|---|
| arxiv | Preprints, CS/Physics/Math, free full-text |
| openalex | 200M+ works, open metadata, citations |
| semantic_scholar | AI/ML focus, citation graph, abstracts |
| scopus | Comprehensive but requires API key |
Next Step
After search is complete, use scimesh:screening to start the assisted screening loop.