| name | literature-review |
| description | Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.). |
| allowed-tools | Read Write Edit Bash |
| license | MIT license |
| metadata | {"skill-author":"K-Dense Inc."} |
Literature Review
Overview
Conduct systematic, comprehensive literature reviews following rigorous academic methodology. Search multiple literature databases, synthesize findings thematically, verify all citations for accuracy, and generate professional output documents in markdown and PDF formats.
This skill uses the parallel-web skill (parallel-cli search) as the primary web search tool for broad academic literature discovery, supplemented by specialized database access skills (gget, bioservices, datacommons-client). It provides specialized tools for citation verification, result aggregation, and document generation.
When to Use This Skill
Use this skill when:
- Conducting a systematic literature review for research or publication
- Synthesizing current knowledge on a specific topic across multiple sources
- Performing meta-analysis or scoping reviews
- Writing the literature review section of a research paper or thesis
- Investigating the state of the art in a research domain
- Identifying research gaps and future directions
- Requiring verified citations and professional formatting
Visual Enhancement with Scientific Schematics
⚠️ MANDATORY: Every literature review MUST include at least 1-2 AI-generated figures using the scientific-schematics skill.
This is not optional. Literature reviews without visual elements are incomplete. Before finalizing any document:
- Generate at minimum ONE schematic or diagram (e.g., PRISMA flow diagram for systematic reviews)
- Prefer 2-3 figures for comprehensive reviews (search strategy flowchart, thematic synthesis diagram, conceptual framework)
How to generate figures:
- Use the scientific-schematics skill to generate AI-powered publication-quality diagrams
- Simply describe your desired diagram in natural language
- Nano Banana Pro will automatically generate, review, and refine the schematic
How to generate schematics:
python scripts/generate_schematic.py "your diagram description" -o figures/output.png
The AI will automatically:
- Create publication-quality images with proper formatting
- Review and refine through multiple iterations
- Ensure accessibility (colorblind-friendly, high contrast)
- Save outputs in the figures/ directory
When to add schematics:
- PRISMA flow diagrams for systematic reviews
- Literature search strategy flowcharts
- Thematic synthesis diagrams
- Research gap visualization maps
- Citation network diagrams
- Conceptual framework illustrations
- Any complex concept that benefits from visualization
For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.
Core Workflow
Literature reviews follow a structured, multi-phase workflow:
Phase 1: Planning and Scoping
-
Define Research Question: Use PICO framework (Population, Intervention, Comparison, Outcome) for clinical/biomedical reviews
- Example: "What is the efficacy of CRISPR-Cas9 (I) for treating sickle cell disease (P) compared to standard care (C)?"
-
Establish Scope and Objectives:
- Define clear, specific research questions
- Determine review type (narrative, systematic, scoping, meta-analysis)
- Set boundaries (time period, geographic scope, study types)
-
Develop Search Strategy:
- Identify 2-4 main concepts from research question
- List synonyms, abbreviations, and related terms for each concept
- Plan Boolean operators (AND, OR, NOT) to combine terms
- Select minimum 3 complementary databases
- Use the parallel-web skill (
parallel-cli search) for initial scoping to quickly gauge the landscape before formal database searches
-
Set Inclusion/Exclusion Criteria:
- Date range (e.g., last 10 years: 2015-2024)
- Language (typically English, or specify multilingual)
- Publication types (peer-reviewed, preprints, reviews)
- Study designs (RCTs, observational, in vitro, etc.)
- Document all criteria clearly
Phase 2: Systematic Literature Search
-
Multi-Database Search:
Select databases appropriate for the domain. Always start with parallel-web for broad academic coverage, then supplement with domain-specific databases.
Web-Based Academic Search (parallel-web skill — START HERE):
- Use
parallel-cli search with academic domain filtering for broad scholarly coverage
- Run two searches: academic-focused + general to catch all relevant sources
parallel-cli search "your research topic" -q "keyword1" -q "keyword2" \
--json --max-results 10 --excerpt-max-chars-total 27000 \
--include-domains "scholar.google.com,arxiv.org,pubmed.ncbi.nlm.nih.gov,semanticscholar.org,biorxiv.org,medrxiv.org,ncbi.nlm.nih.gov,nature.com,science.org,ieee.org,acm.org,springer.com,wiley.com,cell.com,pnas.org,nih.gov" \
-o sources/litreview_<topic>-academic.json
parallel-cli search "your research topic" -q "keyword1" -q "keyword2" \
--json --max-results 10 --excerpt-max-chars-total 27000 \
-o sources/litreview_<topic>-general.json
- Use
parallel-cli extract to fetch full content from specific paper URLs or PDFs found in search results
parallel-cli extract "https://arxiv.org/abs/XXXX.XXXXX" --json
Biomedical & Life Sciences:
- Use
gget skill: gget search pubmed "search terms" for PubMed/PMC
- Use
gget skill: gget search biorxiv "search terms" for preprints
- Use
bioservices skill for ChEMBL, KEGG, UniProt, etc.
General Scientific Literature:
- Search arXiv via direct API (preprints in physics, math, CS, q-bio)
- Search Semantic Scholar via API (200M+ papers, cross-disciplinary)
- Use Google Scholar for comprehensive coverage (manual or careful scraping)
Specialized Databases:
- Use
gget alphafold for protein structures
- Use
gget cosmic for cancer genomics
- Use
datacommons-client for demographic/statistical data
- Use specialized databases as appropriate for the domain
-
Document Search Parameters:
## Search Strategy
### Database: PubMed
- **Date searched**: 2024-10-25
- **Date range**: 2015-01-01 to 2024-10-25
- **Search string**:
("CRISPR"[Title] OR "Cas9"[Title])
AND ("sickle cell"[MeSH] OR "SCD"[Title/Abstract])
AND 2015:2024[Publication Date]
- **Results**: 247 articles
Repeat for each database searched.
-
Export and Aggregate Results:
Phase 3: Screening and Selection
-
Deduplication:
python search_databases.py results.json --deduplicate --output unique_results.json
- Removes duplicates by DOI (primary) or title (fallback)
- Document number of duplicates removed
-
Title Screening:
- Review all titles against inclusion/exclusion criteria
- Exclude obviously irrelevant studies
- Document number excluded at this stage
-
Abstract Screening:
- Read abstracts of remaining studies
- Apply inclusion/exclusion criteria rigorously
- Document reasons for exclusion
-
Full-Text Screening:
- Obtain full texts of remaining studies
- Conduct detailed review against all criteria
- Document specific reasons for exclusion
- Record final number of included studies
-
Create PRISMA Flow Diagram:
Initial search: n = X
├─ After deduplication: n = Y
├─ After title screening: n = Z
├─ After abstract screening: n = A
└─ Included in review: n = B
Phase 4: Data Extraction and Quality Assessment
-
Extract Key Data from each included study:
- Study metadata (authors, year, journal, DOI)
- Study design and methods
- Sample size and population characteristics
- Key findings and results
- Limitations noted by authors
- Funding sources and conflicts of interest
-
Assess Study Quality:
- For RCTs: Use Cochrane Risk of Bias tool
- For observational studies: Use Newcastle-Ottawa Scale
- For systematic reviews: Use AMSTAR 2
- Rate each study: High, Moderate, Low, or Very Low quality
- Consider excluding very low-quality studies
-
Organize by Themes:
- Identify 3-5 major themes across studies
- Group studies by theme (studies may appear in multiple themes)
- Note patterns, consensus, and controversies
Phase 5: Synthesis and Analysis
-
Create Review Document from template:
cp assets/review_template.md my_literature_review.md
-
Write Thematic Synthesis (NOT study-by-study summaries):
- Organize Results section by themes or research questions
- Synthesize findings across multiple studies within each theme
- Compare and contrast different approaches and results
- Identify consensus areas and points of controversy
- Highlight the strongest evidence
Example structure:
#### 3.3.1 Theme: CRISPR Delivery Methods
Multiple delivery approaches have been investigated for therapeutic
gene editing. Viral vectors (AAV) were used in 15 studies^1-15^ and
showed high transduction efficiency (65-85%) but raised immunogenicity
concerns^3,7,12^. In contrast, lipid nanoparticles demonstrated lower
efficiency (40-60%) but improved safety profiles^16-23^.
-
Critical Analysis:
- Evaluate methodological strengths and limitations across studies
- Assess quality and consistency of evidence
- Identify knowledge gaps and methodological gaps
- Note areas requiring future research
-
Write Discussion:
- Interpret findings in broader context
- Discuss clinical, practical, or research implications
- Acknowledge limitations of the review itself
- Compare with previous reviews if applicable
- Propose specific future research directions
Phase 6: Citation Verification
CRITICAL: All citations must be verified for accuracy before final submission.
-
Verify All DOIs:
python scripts/verify_citations.py my_literature_review.md
This script:
- Extracts all DOIs from the document
- Verifies each DOI resolves correctly
- Retrieves metadata from CrossRef
- Generates verification report
- Outputs properly formatted citations
-
Review Verification Report:
- Check for any failed DOIs
- Verify author names, titles, and publication details match
- Correct any errors in the original document
- Re-run verification until all citations pass
-
Format Citations Consistently:
- Choose one citation style and use throughout (see
references/citation_styles.md)
- Common styles: APA, Nature, Vancouver, Chicago, IEEE
- Use verification script output to format citations correctly
- Ensure in-text citations match reference list format
Phase 7: Document Generation
-
Generate PDF:
python scripts/generate_pdf.py my_literature_review.md \
--citation-style apa \
--output my_review.pdf
Options:
--citation-style: apa, nature, chicago, vancouver, ieee
--no-toc: Disable table of contents
--no-numbers: Disable section numbering
--check-deps: Check if pandoc/xelatex are installed
-
Review Final Output:
- Check PDF formatting and layout
- Verify all sections are present
- Ensure citations render correctly
- Check that figures/tables appear properly
- Verify table of contents is accurate
-
Quality Checklist:
Database-Specific Search Guidance
PubMed / PubMed Central
Access via gget skill:
gget search pubmed "CRISPR gene editing" -l 100
Search tips:
- Use MeSH terms:
"sickle cell disease"[MeSH]
- Field tags:
[Title], [Title/Abstract], [Author]
- Date filters:
2020:2024[Publication Date]
- Boolean operators: AND, OR, NOT
- See MeSH browser: https://meshb.nlm.nih.gov/search
bioRxiv / medRxiv
Access via gget skill:
gget search biorxiv "CRISPR sickle cell" -l 50
Important considerations:
- Preprints are not peer-reviewed
- Verify findings with caution
- Check if preprint has been published (CrossRef)
- Note preprint version and date
arXiv
Access via direct API or WebFetch:
search_query = "cat:q-bio.QM AND ti:\"single cell sequencing\""
Semantic Scholar
Access via direct API (requires API key, or use free tier):
- 200M+ papers across all fields
- Excellent for cross-disciplinary searches
- Provides citation graphs and paper recommendations
- Use for finding highly influential papers
Specialized Biomedical Databases
Use appropriate skills:
- ChEMBL:
bioservices skill for chemical bioactivity
- UniProt:
gget or bioservices skill for protein information
- KEGG:
bioservices skill for pathways and genes
- COSMIC:
gget skill for cancer mutations
- AlphaFold:
gget alphafold for protein structures
- PDB:
gget or direct API for experimental structures
Citation Chaining
Expand search via citation networks:
-
Forward citations (papers citing key papers):
- Use
parallel-cli search to find papers citing a specific work:
parallel-cli search "papers citing [Author et al. Year] [paper title]" \
-q "citing" -q "[key author]" \
--json --max-results 10 --excerpt-max-chars-total 27000 \
--include-domains "scholar.google.com,semanticscholar.org,arxiv.org,pubmed.ncbi.nlm.nih.gov" \
-o sources/litreview_forward_citations.json
- Use Google Scholar "Cited by"
- Use Semantic Scholar or OpenAlex APIs
- Identifies newer research building on seminal work
-
Backward citations (references from key papers):
Citation Style Guide
Detailed formatting guidelines are in references/citation_styles.md. Quick reference:
APA (7th Edition)
- In-text: (Smith et al., 2023)
- Reference: Smith, J. D., Johnson, M. L., & Williams, K. R. (2023). Title. Journal, 22(4), 301-318. https://doi.org/10.xxx/yyy
Nature
- In-text: Superscript numbers^1,2^
- Reference: Smith, J. D., Johnson, M. L. & Williams, K. R. Title. Nat. Rev. Drug Discov. 22, 301-318 (2023).
Vancouver
- In-text: Superscript numbers^1,2^
- Reference: Smith JD, Johnson ML, Williams KR. Title. Nat Rev Drug Discov. 2023;22(4):301-18.
Always verify citations with verify_citations.py before finalizing.
Prioritizing High-Impact Papers (CRITICAL)
Always prioritize influential, highly-cited papers from reputable authors and top venues. Quality matters more than quantity in literature reviews.
Citation Count Thresholds
Use citation counts to identify the most impactful papers:
| Paper Age | Citation Threshold | Classification |
|---|
| 0-3 years | 20+ citations | Noteworthy |
| 0-3 years | 100+ citations | Highly Influential |
| 3-7 years | 100+ citations | Significant |
| 3-7 years | 500+ citations | Landmark Paper |
| 7+ years | 500+ citations | Seminal Work |
| 7+ years | 1000+ citations | Foundational |
Journal and Venue Tiers
Prioritize papers from higher-tier venues:
- Tier 1 (Always Prefer): Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS, Nature Medicine, Nature Biotechnology
- Tier 2 (Strong Preference): High-impact specialized journals (IF>10), top conferences (NeurIPS, ICML for ML/AI)
- Tier 3 (Include When Relevant): Respected specialized journals (IF 5-10)
- Tier 4 (Use Sparingly): Lower-impact peer-reviewed venues
Author Reputation Assessment
Prefer papers from:
- Senior researchers with high h-index (>40 in established fields)
- Leading research groups at recognized institutions (Harvard, Stanford, MIT, Oxford, etc.)
- Authors with multiple Tier-1 publications in the relevant field
- Researchers with recognized expertise (awards, editorial positions, society fellows)
Identifying Seminal Papers
For any topic, identify foundational work by:
- High citation count (typically 500+ for papers 5+ years old)
- Frequently cited by other included studies (appears in many reference lists)
- Published in Tier-1 venues (Nature, Science, Cell family)
- Written by field pioneers (often cited as establishing concepts)
Best Practices
Search Strategy
- Start with parallel-web: Use
parallel-cli search with academic domains for initial broad coverage before querying specialized databases
- Use multiple databases (minimum 3): Ensures comprehensive coverage — parallel-web counts as one source
- Include preprint servers: Captures latest unpublished findings
- Document everything: Search strings, dates, result counts for reproducibility — save all parallel-cli output to
sources/
- Test and refine: Run pilot searches, review results, adjust search terms
- Sort by citations: When available, sort search results by citation count to surface influential work first
- Use parallel-cli extract: Fetch full content from promising URLs found during search to verify relevance before full-text screening
Screening and Selection
- Use multiple databases (minimum 3): Ensures comprehensive coverage
- Include preprint servers: Captures latest unpublished findings
- Document everything: Search strings, dates, result counts for reproducibility
- Test and refine: Run pilot searches, review results, adjust search terms
Screening and Selection
- Use clear criteria: Document inclusion/exclusion criteria before screening
- Screen systematically: Title → Abstract → Full text
- Document exclusions: Record reasons for excluding studies
- Consider dual screening: For systematic reviews, have two reviewers screen independently
Synthesis
- Organize thematically: Group by themes, NOT by individual studies
- Synthesize across studies: Compare, contrast, identify patterns
- Be critical: Evaluate quality and consistency of evidence
- Identify gaps: Note what's missing or understudied
Quality and Reproducibility
- Assess study quality: Use appropriate quality assessment tools
- Verify all citations: Run verify_citations.py script
- Document methodology: Provide enough detail for others to reproduce
- Follow guidelines: Use PRISMA for systematic reviews
Writing
- Be objective: Present evidence fairly, acknowledge limitations
- Be systematic: Follow structured template
- Be specific: Include numbers, statistics, effect sizes where available
- Be clear: Use clear headings, logical flow, thematic organization
Common Pitfalls to Avoid
- Single database search: Misses relevant papers; always search multiple databases
- No search documentation: Makes review irreproducible; document all searches
- Study-by-study summary: Lacks synthesis; organize thematically instead
- Unverified citations: Leads to errors; always run verify_citations.py
- Too broad search: Yields thousands of irrelevant results; refine with specific terms
- Too narrow search: Misses relevant papers; include synonyms and related terms
- Ignoring preprints: Misses latest findings; include bioRxiv, medRxiv, arXiv
- No quality assessment: Treats all evidence equally; assess and report quality
- Publication bias: Only positive results published; note potential bias
- Outdated search: Field evolves rapidly; clearly state search date
Example Workflow
Complete workflow for a biomedical literature review:
cp assets/review_template.md crispr_sickle_cell_review.md
parallel-cli search "CRISPR Cas9 sickle cell disease gene therapy efficacy" \
-q "CRISPR" -q "sickle cell" -q "gene therapy" \
--json --max-results 10 --excerpt-max-chars-total 27000 \
--include-domains "scholar.google.com,arxiv.org,pubmed.ncbi.nlm.nih.gov,semanticscholar.org,biorxiv.org,nature.com,science.org,cell.com,pnas.org,nih.gov" \
-o sources/litreview_crispr_scd-academic.json
parallel-cli search "CRISPR sickle cell disease clinical trials treatment" \
-q "CRISPR" -q "sickle cell" \
--json --max-results 10 --excerpt-max-chars-total 27000 \
-o sources/litreview_crispr_scd-general.json
python scripts/search_databases.py combined_results.json \
--deduplicate \
--rank citations \
--year-start 2015 \
--year-end 2024 \
--format markdown \
--output search_results.md \
--summary
python scripts/verify_citations.py crispr_sickle_cell_review.md
cat crispr_sickle_cell_review_citation_report.json
python scripts/verify_citations.py crispr_sickle_cell_review.md
python scripts/generate_pdf.py crispr_sickle_cell_review.md \
--citation-style nature \
--output crispr_sickle_cell_review.pdf
Integration with Other Skills
This skill works seamlessly with other scientific skills:
Web Search & Extraction (parallel-web skill — PRIMARY)
- parallel-cli search: Broad academic and general web search with domain filtering — use for initial scoping, finding papers, citation chaining, and supplementary searches
- parallel-cli extract: Fetch full content from paper URLs, journal websites, and preprint servers — use for reading abstracts, extracting reference lists, and verifying paper details
- parallel-cli search --include-domains: Academic-focused search across scholarly domains (arxiv.org, pubmed, nature.com, etc.)
Database Access Skills
- gget: PubMed, bioRxiv, COSMIC, AlphaFold, Ensembl, UniProt
- bioservices: ChEMBL, KEGG, Reactome, UniProt, PubChem
- datacommons-client: Demographics, economics, health statistics
Analysis Skills
- pydeseq2: RNA-seq differential expression (for methods sections)
- scanpy: Single-cell analysis (for methods sections)
- anndata: Single-cell data (for methods sections)
- biopython: Sequence analysis (for background sections)
Visualization Skills
- matplotlib: Generate figures and plots for review
- seaborn: Statistical visualizations
Writing Skills
- brand-guidelines: Apply institutional branding to PDF
- internal-comms: Adapt review for different audiences
Resources
Bundled Resources
Scripts:
scripts/verify_citations.py: Verify DOIs and generate formatted citations
scripts/generate_pdf.py: Convert markdown to professional PDF
scripts/search_databases.py: Process, deduplicate, and format search results
References:
references/citation_styles.md: Detailed citation formatting guide (APA, Nature, Vancouver, Chicago, IEEE)
references/database_strategies.md: Comprehensive database search strategies
Assets:
assets/review_template.md: Complete literature review template with all sections
External Resources
Guidelines:
Tools:
Citation Styles:
Dependencies
Required CLI Tools
curl -fsSL https://parallel.ai/install.sh | bash
Required Python Packages
pip install requests
Required System Tools
brew install pandoc
apt-get install pandoc
brew install --cask mactex
apt-get install texlive-xetex
Check dependencies:
python scripts/generate_pdf.py --check-deps
Summary
This literature-review skill provides:
- Systematic methodology following academic best practices
- Parallel-web powered search using
parallel-cli search for fast, broad academic literature discovery with scholarly domain filtering
- Multi-database integration via existing scientific skills (gget, bioservices, datacommons-client)
- Citation verification ensuring accuracy and credibility
- Professional output in markdown and PDF formats
- Comprehensive guidance covering the entire review process
- Quality assurance with verification and validation tools
- Reproducibility through detailed documentation requirements
Conduct thorough, rigorous literature reviews that meet academic standards and provide comprehensive synthesis of current knowledge in any domain.