| name | academic-research |
| description | Nested swiss-knife reference for academic literature work — find papers, fetch full-text PDFs, trace citations, write LaTeX manuscripts. **First action for any "get me this paper" request:** `python3 <skill-path>/scripts/fetch_paper.py <DOI|arXiv-ID|PMID>` — walks arXiv → Unpaywall → Europe PMC → CORE → in-house publisher-page extraction (Nature/APS/AIP/IOP/Cambridge) → authorized institutional publisher → LibGen and saves the paper, metadata, and a resumable manifest under `papers/{slug}/`. Read the body when you need to escape the script: custom query shapes, citation networks, scholar analysis, LaTeX writing, or a tier-specific API call. Indexes 12 deep-dive API references and 6 pipeline workflows under `reference/`.
|
| version | 3.0.0 |
| allowed-tools | Bash(python3 *) Bash(curl *) Bash(pip *) Bash(pip3 *) |
| tags | ["academic","research","arxiv","crossref","openalex","semantic-scholar","core","pubmed","unpaywall","doi","pdf","citation","pipeline","europe-pmc","nasa-ads","inspire-hep","nested-skill"] |
Academic Research
Nested swiss-knife reference. A modular skill: try the bundled script
first, then load specific reference files only when you need to escape it.
Try this first
For 80% of "get me this paper" requests, the bundled script is the right answer.
It walks the open-access ladder, falls back automatically, and writes a manifest
the next session can resume from.
python3 <skill-path>/scripts/fetch_paper.py 10.1103/PhysRevLett.125.015001
python3 <skill-path>/scripts/fetch_paper.py arXiv:2301.00001
python3 <skill-path>/scripts/fetch_paper.py PMID:12345678
python3 <skill-path>/scripts/fetch_paper.py --batch dois.txt --out papers/
python3 <skill-path>/scripts/fetch_paper.py 10.1038/nature12373 --dry-run
python3 <skill-path>/scripts/fetch_paper.py <id> --no-libgen
Output layout (idempotent — re-runs skip entries with status: ok):
papers/{first-author-year-firstword}/
├── paper.pdf | paper.md # full-text artifact
├── metadata.json # CrossRef-normalized
└── manifest.json # {status, tier, source, ts, doi}
Tier ladder (script stops at first hit):
| Tier | Source | Best for |
|---|
| 1 | arXiv direct | Preprints (physics, CS, math, q-bio, econ) |
| 2 | Unpaywall | Publisher-blessed gold/green OA |
| 3 | Europe PMC | Biomedical full-text + PMC mirror |
| 4 | CORE | Institutional repositories (needs $CORE_API_KEY) |
| 5 | Publisher-page extract | Nature/APS/AIP/IOP/Cambridge → in-house extractor (stdlib + requests, no third-party deps). Fetches the already-accessible official article / DOI landing page and parses citation_* metadata + the article body into structured Markdown. No paywall/CAPTCHA bypass, no cookies/credentials — official pages only. A login/paywall page is a clean miss; the ladder then falls through. Opt out with --no-publisher-extract. |
| 5b | Authorized publisher | Licensed/institutional access only — official DOI landing page → same-host publisher PDF → %PDF- validation, full provenance. Recovers paywalled-but-subscribed papers without shadow libraries. Never bypasses paywalls or handles credentials. Opt out with --no-institutional. See authorized-publisher-access.md. |
| 6 | LibGen | Last resort; opt out with --no-libgen |
Set $LINGTAI_RESEARCH_EMAIL to a real address before first use — Unpaywall
rejects placeholder emails with HTTP 422. The default falls back to
lingtai-agent@example.org with a warning.
Read on only if: the script fails on your paper, you need a custom query shape,
or you're composing a multi-step workflow (search → fetch → cite → write).
Before drafting citation-bearing academic writing
Verify sources before you write prose. If you are about to draft a paper,
related-work section, literature review, References list, or any author–year /
citation-bearing manuscript, pass the evidence-verification gate first:
reference/evidence-verification-gate.md.
No verified evidence → no confident prose. Peer-reviewed and preprint sources
must not share the same evidence layer; search results are leads, not citations.
If the user asks to "go fast," reduce scope, not verification. Produce a
verified literature matrix before any submission-like draft.
Escape hatch — quick paths
I'm about to write a paper / references / related-work section → reference/evidence-verification-gate.md (verify FIRST)
The user wants a "fast" / "readable" paper draft → reference/evidence-verification-gate.md (reduce scope, not verification)
I have a DOI → reference/api-doi-resolver.md → api-crossref.md
I have an arXiv ID → reference/api-arxiv.md (direct PDF link)
I have a PMID → reference/api-europe-pmc.md
I have a bibcode → reference/api-nasa-ads.md (requires free key)
I only have keywords → reference/decision-tree.md → pick API by discipline
I need a citation network → reference/api-semantic-scholar.md or api-openalex.md
I need to override the PDF ladder → reference/pipeline-obtain-pdf.md
Tier-5 publisher-extract failed and I want to retry it manually → reference/publisher-page-extraction.md
OA + authorized-publisher failed and I have a batch + the user's Zotero → reference/zotero-institutional-fulltext-handoff.md
All OA chains failed → reference/libgen-fallback.md (last resort)
I need astrophysics → reference/api-nasa-ads.md
I need high-energy physics → reference/api-inspire-hep.md
I need biomedical → reference/api-europe-pmc.md or api-pubmed.md
I need to write/compile a paper → reference/pipeline-latex-writing.md
My empirical draft keeps getting reframed / reviewers "agree" → reference/anti-pattern-text-consistency-vs-data-correspondence.md
I hit an API error → reference/error-handling.md
Reference index
API references (12)
Each card includes endpoint parameters, runnable code, response shape, rate limits, and fallbacks.
Pipeline workflows (6)
Standalone references
- evidence-verification-gate.md — verify-before-drafting gate for citation-bearing academic writing: detection triggers, the verified-literature-matrix schema, A/B/C evidence tiers, "go fast = reduce scope not verification," matrix-before-draft artifact convention, pre-draft lint
- publisher-page-extraction.md — Tier-5 manual escape hatch (Nature/APS/AIP/IOP/Cambridge → structured Markdown)
- authorized-publisher-access.md — Tier-5b automated probe: official DOI landing page → same-host publisher PDF, licensed access only, no paywall/credential handling
- zotero-institutional-fulltext-handoff.md — Tier-6a human-in-the-loop: stage a failed batch into Zotero with tags, the human runs Find Full Text (institutional access), agent harvests PDFs back with provenance. No UI automation / TCC bypass / credential handling.
- libgen-fallback.md — Last-resort PDF source with legal/safety notes
- error-handling.md — 429 backoff, 403 publisher blocks, timeout patterns
- anti-pattern-text-consistency-vs-data-correspondence.md — empirical-writing failure mode: prose drifts from the data while reviewer rounds make it more polished. Trigger pattern, re-anchoring steps, detection checklist.
Relationship to web-browsing
- web-browsing is the routing layer ("which tier to use for this URL?")
- academic-research is the deep-dive layer ("how do I write OpenAlex filter parameters? what email does Unpaywall want?")
- The two are complementary. If you're just scraping one publisher page and don't need the OA ladder, web-browsing's
extract_page.py is lighter.
Known caveats
- Unpaywall's
email parameter is required and must be real — placeholder addresses get HTTP 422. Set $LINGTAI_RESEARCH_EMAIL once.
- CORE without an API key is harshly rate-limited (~100/day vs 10,000/day with a free key from https://core.ac.uk/services/api).
- Semantic Scholar free tier is very tight (~100 reqs / 5 min). Request a key for any serious citation-network work.
- Google Scholar requires a stealth browser (camoufox or playwright-stealth v2); legacy
playwright_stealth API does not work.
- arXiv enforces HTTPS — HTTP requests are 301-redirected automatically.
- Library Genesis legality varies by jurisdiction — use is the user's responsibility. Pass
--no-libgen to opt out.
- Publisher-page extraction (Tier 5) is an in-house, self-contained extractor — stdlib +
requests, no third-party dependency and nothing to install (replaces the old broken zhiping0913/Download_paper path, issue #136). It fetches the already-accessible official article / DOI landing page and parses citation_* metadata + the article body into structured Markdown with a provenance/limitations footer. It performs no paywall/CAPTCHA bypass and no cookie/credential handling — official pages only. A login/paywall interstitial, an unsupported DOI prefix, or a page with too little readable text is a clean miss, and the ladder falls through. Skip the tier with --no-publisher-extract. The extraction is heuristic (equations/figures/tables may be lost), so treat the artifact as a convenience copy, not a typeset full text. See reference/publisher-page-extraction.md.
- Authorized-publisher tier (5b) only uses access you already have — it follows the official DOI landing page and grabs a same-host publisher PDF, validating
%PDF- bytes and Content-Type before saving. It never bypasses paywalls/CAPTCHAs and never reads, stores, or replays cookies/credentials; most institutional access is IP-based, so on a licensed network a plain HTTP GET may work. Off-network it harmlessly misses. Pass --no-institutional to disable in legal-sensitive environments. Cookies/auth headers are never written to provenance. See reference/authorized-publisher-access.md.
- Zotero institutional full-text is a human-in-the-loop handoff, not an agent tool (Tier 6a) — when OA and the Tier-5b authorized-publisher probe both miss for a batch, and the user runs Zotero Desktop on an institutional network, the agent stages the failed rows into Zotero via
/connector/saveItems with a dated batch tag, the human selects the tagged items and clicks Find Full Text in the Zotero UI, then the agent harvests the resulting PDFs back with provenance (%PDF- validation, copy-not-move, resolved_by: human Find Full Text). Zotero's Local API item routes are GET-only and the Connector resolver set (oa+custom) is narrower than UI Find Full Text, so no HTTP endpoint exposes the broad path. Do not drive the UI via AppleScript/Accessibility — macOS TCC blocks it (-1719/-1743) and it is out of policy. No paywall bypass, no credential/cookie/session handling, no automatic LibGen fallthrough. See reference/zotero-institutional-fulltext-handoff.md.
- Drafting citation-bearing academic writing without verifying sources first — under "fast"/"readable draft" pressure the model flattens DOI-verified papers, preprints, and search leads into one confident evidence layer, which reads as submission-ready and invites reviewer rejection. Pass the evidence-verification gate before drafting: verify a literature matrix first, keep A/B/C reliability tiers separate in the prose, and reduce scope (not verification) when speed is requested.
- Writing an empirical paper iteratively can drift from the data — reviewer rounds make prose more polished and internally consistent without verifying it still matches the experiments on disk. Reviewer agreement is text-consistency evidence, not data-correspondence evidence. Anchor every claim to data files/runner code before writing, and re-derive (don't just rewrite) when feedback flags confusion. See reference/anti-pattern-text-consistency-vs-data-correspondence.md.
Found a bug or issue? If you encounter any problems with this skill, load the lingtai-issue-report skill and follow its instructions to report it.