Run any Skill in Manus with one click

literature-search-arxiv

Stars317

Forks25

UpdatedJune 1, 2026 at 19:49

Search for scientific papers, preprints, and publications on arXiv. Extract metadata, abstracts, and download full-text PDFs or HTML versions of papers. Use when the user asks to find research papers, literature, or specific arXiv IDs.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

mkurman

mkurman/zorai

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Biological Scientists, All OtherLife, Physical, and Social Science Occupations·SOC 19-1029

File Explorer

5 files

SKILL.md

readonly

name	literature-search-arxiv
description	Search for scientific papers, preprints, and publications on arXiv. Extract metadata, abstracts, and download full-text PDFs or HTML versions of papers. Use when the user asks to find research papers, literature, or specific arXiv IDs.

arXiv Search and Retrieval

Prerequisites

uv: Read the uv skill and follow its Setup instructions to ensure uv is installed and on PATH.
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://info.arxiv.org/help/api/index.html and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.

Core Rules

Terms of Use: You MUST respect arXiv's Terms of Use.
- Maximum 1 request every 3 seconds.
- The provided utility scripts handle rate limiting automatically. Always use these scripts rather than writing your own curl/python requests.
If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.

Utility Scripts

1. Search and Extract Metadata

Search arXiv and return a clean JSON array of matching papers.

uv run scripts/search_arxiv.py --query "au:einstein AND ti:relativity" \
  --max_results 5 2>/dev/null > /tmp/arxiv_search_results.json

Important: The tool outputs a large JSON result to stdout. Requesting 100+ results will produce a massive JSON that might exceed your context length. Limit --max_results (e.g., 5-10) or paginate carefully using --start. Always redirect output to a file and parse it separately, otherwise terminal output will be truncated.

Returned Metadata: JSON results include id, title, summary, published, authors, pdf_url, primary_category, doi, journal_ref, and comment. Note: the doi field only contains DOI information in case the paper has an external DOI and if only an arXiv-issued DOI exists, this is DOI is not returned.

Options:

--query: Search string. See references/query_syntax.md for advanced syntax.
--id_list: Comma-separated list of arXiv IDs to fetch directly (e.g., 1706.03762v5).
--start: Pagination offset (default 0).
--max_results: Number of results to return (default 10).
--sort_by: relevance, lastUpdatedDate, or submittedDate. (Use --sort_by submittedDate --sort_order descending for the most recent papers).
--sort_order: ascending or descending.

2. Download Paper (PDF or HTML)

Download the full text of a paper to your local workspace for reading.

uv run scripts/download_paper.py --id 1706.03762 --format pdf --output attention.pdf

Options:

--id: The arXiv ID (e.g., 1706.03762 or 1706.03762v5).
--format: pdf or html. Note: HTML is only available for newer papers.
--output: Filepath to save the downloaded document.

Important: when downloading papers, make sure you download them to a location where you do not overwrite other files and do not clutter existing directory structure.

3. Download Paper Source (tar.gz)

Download the LaTeX source files of a paper to your local workspace. Note that not all papers have source available.

uv run scripts/download_paper_source.py --id 2010.11645 --output source.tar.gz

Options:

--id: The arXiv ID (e.g., 2010.11645).
--output: Filepath to save the downloaded tar.gz file.

Caution: Care should be exercised when untar'ing the downloaded file for security and to avoid cluttering your filesystem, as archives may contain many files or unexpected directory structures.

Safe Extraction Requirements: NEVER extract directly into your working directory! Always extract into a dedicated new directory: bash mkdir paper_source && tar -xzf source.tar.gz -C paper_source

Reference

Advanced Query Syntax: See references/query_syntax.md for prefixes (au, ti, abs), booleans, and date filtering.

Workflow

Search for papers using search_arxiv.py. Review the JSON summaries.
If full text is needed, use download_paper.py to fetch the PDF or HTML.
If downloading a PDF, verify the PDF is not empty or corrupted.
Read the downloaded file using standard file reading tools.

More from this repository

same repository

prime-intellect-cli

mkurman/zorai

Use when provisioning Prime Intellect GPU compute, managing pods/disks/sandboxes, running hosted RL training via prime lab, installing or publishing RL environments, or exposing local services via Prime Tunnel. Covers the `prime` CLI (PyPI: prime) for all Prime Intellect platform operations.

2026-06-24317

scienceskillscommon

mkurman/zorai

Shared Python package for Science Skills, currently containing http_client -- a unified HTTP client with rate limiting, retries, and exponential backoff. Not a standalone agent skill. Do not invoke directly.

2026-06-01317

alphafold-database-fetch-and-analyze

mkurman/zorai

Retrieve and analyze AlphaFold predicted structures for a protein. Use when the user provides a specific UniProt Accession ID and wants structural confidence metrics (pLDDT), domain boundary analysis, or disorder assessment. Do not use if the user only has a protein name, gene name, or amino acid sequence — ask for a UniProt ID first.

2026-06-01317

chembl-database

mkurman/zorai

Query the ChEMBL database for bioactive molecules, drug targets, bioactivity data, approved drugs, and chemical structures. Use when the user asks about compounds, targets, IC50/Ki values, drug mechanisms, or structure searches.

2026-06-01317

clinical-trials-database

mkurman/zorai

Query ClinicalTrials.gov via APIv2. Use when you want to search for trials by condition, drug, location, status, or phase; retrieve trial details by NCT ID; check eligibility/inclusion criteria; count trials across conditions or time periods; identify a sponsor's trial portfolio; find recruiting trials for patient matching.

2026-06-01317

clinvar-database

mkurman/zorai

Use when needing clinical significance, pathogenicity classifications (e.g., Pathogenic, Benign, VUS), clinical evidence rationales, or finding "hard positive" benchmark controls for human genomic variants.

2026-06-01317

name	literature-search-arxiv
description	Search for scientific papers, preprints, and publications on arXiv. Extract metadata, abstracts, and download full-text PDFs or HTML versions of papers. Use when the user asks to find research papers, literature, or specific arXiv IDs.

arXiv Search and Retrieval

Prerequisites

uv: Read the uv skill and follow its Setup instructions to ensure uv is installed and on PATH.
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://info.arxiv.org/help/api/index.html and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.

Core Rules

Terms of Use: You MUST respect arXiv's Terms of Use.
- Maximum 1 request every 3 seconds.
- The provided utility scripts handle rate limiting automatically. Always use these scripts rather than writing your own curl/python requests.
If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.

Utility Scripts

1. Search and Extract Metadata

Search arXiv and return a clean JSON array of matching papers.

uv run scripts/search_arxiv.py --query "au:einstein AND ti:relativity" \
  --max_results 5 2>/dev/null > /tmp/arxiv_search_results.json

Important: The tool outputs a large JSON result to stdout. Requesting 100+ results will produce a massive JSON that might exceed your context length. Limit --max_results (e.g., 5-10) or paginate carefully using --start. Always redirect output to a file and parse it separately, otherwise terminal output will be truncated.

Options:

--query: Search string. See references/query_syntax.md for advanced syntax.
--id_list: Comma-separated list of arXiv IDs to fetch directly (e.g., 1706.03762v5).
--start: Pagination offset (default 0).
--max_results: Number of results to return (default 10).
--sort_by: relevance, lastUpdatedDate, or submittedDate. (Use --sort_by submittedDate --sort_order descending for the most recent papers).
--sort_order: ascending or descending.

2. Download Paper (PDF or HTML)

Download the full text of a paper to your local workspace for reading.

uv run scripts/download_paper.py --id 1706.03762 --format pdf --output attention.pdf

Options:

--id: The arXiv ID (e.g., 1706.03762 or 1706.03762v5).
--format: pdf or html. Note: HTML is only available for newer papers.
--output: Filepath to save the downloaded document.

Important: when downloading papers, make sure you download them to a location where you do not overwrite other files and do not clutter existing directory structure.

3. Download Paper Source (tar.gz)

Download the LaTeX source files of a paper to your local workspace. Note that not all papers have source available.

uv run scripts/download_paper_source.py --id 2010.11645 --output source.tar.gz

Options:

--id: The arXiv ID (e.g., 2010.11645).
--output: Filepath to save the downloaded tar.gz file.

Caution: Care should be exercised when untar'ing the downloaded file for security and to avoid cluttering your filesystem, as archives may contain many files or unexpected directory structures.

Safe Extraction Requirements: NEVER extract directly into your working directory! Always extract into a dedicated new directory: bash mkdir paper_source && tar -xzf source.tar.gz -C paper_source

Reference

Advanced Query Syntax: See references/query_syntax.md for prefixes (au, ti, abs), booleans, and date filtering.

Workflow

Search for papers using search_arxiv.py. Review the JSON summaries.
If full text is needed, use download_paper.py to fetch the PDF or HTML.
If downloading a PDF, verify the PDF is not empty or corrupted.
Read the downloaded file using standard file reading tools.