一键在 Manus 中运行任何 Skill

check-bib-references

星标11

分支1

更新时间2026年5月24日 00:07

Catch LLM-fabricated citations in BibTeX files. Verifies arXiv/OpenReview entries against live metadata (titles, first authors), then guides manual verification of authorless prose claims. Use before submitting papers, after any LLM-assisted citation generation, or when a reference smells off.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

yulonglin

yulonglin/dotfiles

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Check BibTeX references against authoritative sources

When to invoke

Before any paper submission — even one wrong citation damages credibility
After LLM-assisted bib generation — any cite added by Claude, Codex, or another LLM should be checked
When a reference looks suspicious — generic-sounding title, vague author list ending in others, no eprint/url
User asks to "fact-check the bib", "verify references", "check citations"

The failure mode this catches

LLMs fabricate plausible-looking citations. Common patterns:

Failure	Example
Wrong arXiv ID — title looks right, but ID points to an unrelated paper	`betley2025-wmdp-sycophancy` cited arXiv 2501.01962 = a plant proteins paper
Wrong title — paper exists but title is paraphrased/invented	`gupta2025-rl-obfuscation`: bib said "Can Models Learn to Evade Monitors?", real title is "Can Language Models Learn to Evade Latent-Space Monitors?"
Wrong/fabricated authors — first author swapped, middle authors invented	`vct-2025`: bib first author "Gopal, Sanskriti" — real first author is "Götting, Jasper". `ward2025-ctrl-alt-deceit`: 21 fabricated authors (real list has 9, mostly different people)
Fully hallucinated entry — paper doesn't exist	`betley2025-wmdp-sycophancy` titled "Sycophancy Hacks: Evaluators Gaming without Scheming" — no such paper
Fabricated venue — wrong workshop/conference name (e.g. "BioSafe GenAI" instead of "Biosecurity Safeguards for Generative AI")
Expanded handles / pseudonyms — source publishes under a handle, LLM "helpfully" replaces with a guessed real name	LessWrong post bylined `merizian`, `alexdzm`, `jacoba` — bib invented real names like "Alexandra Souly" for `alexdzm`. Even if you think you know who the handle belongs to, you don't have evidence the author wanted that attribution

Workflow

1. Automated check (catches ~80% of issues)

uv run ~/.claude/skills/check-bib-references/check_bib.py path/to/main.bib --only-mismatches

The script:

Parses BibTeX (regex-based, no bibtexparser dep)
Extracts arXiv IDs from eprint=, journal={arXiv preprint arXiv:...}, url=, howpublished=
Extracts OpenReview forum IDs from url=
Batches arXiv requests (avoids rate-limit on >5 entries)
Diffs bib title (Jaccard >=0.85) and first author lastname (LaTeX accents normalized)
Exit code 1 if any mismatches/errors

Flags:

--only-mismatches — quieter; suppresses OK/skip lines
--key <citekey> — check a single entry

2. Manual verification (the script can't catch these)

After the script is clean, still verify:

Check	How	Why
Full author list (script only checks first author)	`curl -sL https://arxiv.org/abs/<id> \| grep citation_author`	LLMs often fabricate middle authors. ward2025 had 21 mostly-fake authors past a real first author
Venue / workshop name in `booktitle=`	WebSearch the paper title + "accepted to"	LLMs invent plausible-sounding venue names
Skipped entries (no arxiv/openreview ID)	WebFetch the `url=` directly	Tech reports, blog posts — only direct verification works
Prose claims about the paper in `main.tex`	Read the paper abstract/intro via WebFetch or any2md	Citation can be real but the claim attached to it can be wrong (e.g. "X showed Y" when X showed Z)
Citation context in tex	`grep -n '\\cite[pt]\\?{<key>}' main.tex`	A correct bib entry attached to a wrong claim is still a fact-check failure

3. If sub-agent fact-checking is used

⚠️ Do not trust subagent output for specific factual claims about authors or venues. Sub-agents (writing:fact-checker etc.) without web access will confidently invent author names, conference names, paper titles. Always verify high-stakes findings (claims of fabrication, claims about who is or isn't on a paper) via direct WebFetch in the main context. See ~/.claude/rules/agents-and-delegation.md § Factual Verification.

Source-of-truth APIs

Source	Endpoint	Notes
arXiv	`http://export.arxiv.org/api/query?id_list=2401.05566,2412.14093`	Atom XML. Batch via comma-separated `id_list` — single requests get IP-throttled at ~5 in a row. Rate limit: 1 req/3s, sticky penalty ~30-60s
OpenReview v2	`https://api2.openreview.net/notes?id=<forum_id>`	Use `id=` not `forum=` — `forum=` returns the "Paper Decision" note first
arXiv HTML (full author list)	`https://arxiv.org/abs/<id>` then `grep citation_author`	Use when full author list matters

Lessons from building/using this

Batch arXiv requests. 17 single-ID requests will get the IP throttled for ~60s. One batched id_list= request is fine.
Follow redirects. httpx doesn't by default; arXiv API redirects http→https.
Strip LaTeX accents before comparing authors. B{\"u}rger and Bürger are the same person. Regex: \\[\"'^~=.](?:{(\w)}|(\w))` → group(1)|group(2).
Token Jaccard at 0.85 is the sweet spot for title similarity — catches paraphrases without flagging legitimate punctuation/casing differences.
The script flags candidates; the human decides. A 0.80 score isn't necessarily wrong — a single missing word like "Detection" can drop you below 0.85. Read the diff.
OpenReview's forum= returns the wrong note. Use id=<forum_id> to get the actual paper.
others in author lists hides fabrication. The script can't tell Wijk and Chan and Korbak and others from Wijk and Chan and Korbak and FabricatedAuthor and others. Manually expand others for important refs.
Prose can lie even when the bib is right. A correct citation can support a wrong claim. The script doesn't read .tex — you must.
Preserve handles and pseudonyms verbatim. When the source byline is merizian, alexdzm, jacoba (LessWrong, Substack, blog posts, anonymous workshop submissions), cite them under those handles. Do NOT "expand" to a guessed real name — you don't have evidence the author wants that attribution, and you may be wrong. If the bib has a suspiciously full real name for a post you can only find under a handle, treat it as fabrication until proven otherwise.

Files

check_bib.py — the script (PEP 723 self-contained, uv run directly)

Adapting to other repos

The script is generic — point it at any .bib. Drop the file into the repo's scripts/ for per-project use, or always invoke from ~/.claude/skills/check-bib-references/check_bib.py.

For repos with a different bib structure (e.g. \bibitem in raw .tex, BibLaTeX-only fields), the regex parser may need extending. The current parser handles @type{key, field={value}, field="value", ...} with optional whitespace.

同仓库更多 Skills

同仓库

commit-push-sync

yulonglin/dotfiles

This skill should be used when the user asks to "commit and push", "commit push", "sync changes", "push changes", "commit and sync", or "update remote". Handles the full workflow of committing changes, pulling with rebase, and pushing to remote.

2026-06-2511

sweep-ai-safety

yulonglin/dotfiles

Sweep recent AI safety research from curated sources (Anthropic alignment science / red team, OpenAI, GDM, Apollo, Redwood, METR, FAR AI, Truthful AI, alphaxiv, arXiv) and surface items matching tracked topic terms (inoculation prompting, reward hacking, exploration hacking, metagaming, eval gaming, OOCR, scheming, alignment faking, sandbagging, etc.). Use when asked to "sweep AI safety", "what's new in alignment", "any recent papers on X", "weekly safety digest", or for staying current on AI safety literature.

2026-06-1011

anthropic-style

yulonglin/dotfiles

Anthropic visual style for plots, diagrams, slides, and web. Use when creating any visual output that should have Anthropic's look-and-feel — matplotlib charts, TikZ diagrams, HTML/CSS, or presentations.

2026-06-0711

check-prose-claims

yulonglin/dotfiles

Fact-check prose claims in slides, reports, PDFs, and papers — statistics, comparatives, attributions, causal claims, quotes. Two-pass extract-then-verify protocol with strict numerical precision and a doc-only mode. Use when the user asks to "check the claims in this deck", "fact-check this report", "audit this PDF", "verify the numbers in these slides", or before publishing/shipping any externally-facing document with quantitative claims. Complements `check-bib-references` (which handles BibTeX entries) — this skill handles the prose around them.

2026-05-2411

log-gap

yulonglin/dotfiles

Log a one-line knowledge gap to the project's gaps.md file. Use when the user is surprised by Claude's answer, says "I didn't know that", "wait what", or wants to record a misconception they just discovered. Format "I assumed X but actually Y". Personal misconception log — much higher learning signal than feedback memories.

2026-05-0211

recall-feedback

yulonglin/dotfiles

Resurface a random sample of feedback memories for spaced-repetition review — "still true? changed? promote to global rule?". Use when user asks for feedback retrospective, weekly memory review, or to audit accumulated coaching corrections. Also good for periodic via /schedule.

2026-05-0211

name	check-bib-references
description	Catch LLM-fabricated citations in BibTeX files. Verifies arXiv/OpenReview entries against live metadata (titles, first authors), then guides manual verification of authorless prose claims. Use before submitting papers, after any LLM-assisted citation generation, or when a reference smells off.

Check BibTeX references against authoritative sources

When to invoke

Before any paper submission — even one wrong citation damages credibility
After LLM-assisted bib generation — any cite added by Claude, Codex, or another LLM should be checked
When a reference looks suspicious — generic-sounding title, vague author list ending in others, no eprint/url
User asks to "fact-check the bib", "verify references", "check citations"

The failure mode this catches

LLMs fabricate plausible-looking citations. Common patterns:

Failure	Example
Wrong arXiv ID — title looks right, but ID points to an unrelated paper	`betley2025-wmdp-sycophancy` cited arXiv 2501.01962 = a plant proteins paper
Wrong title — paper exists but title is paraphrased/invented	`gupta2025-rl-obfuscation`: bib said "Can Models Learn to Evade Monitors?", real title is "Can Language Models Learn to Evade Latent-Space Monitors?"
Wrong/fabricated authors — first author swapped, middle authors invented	`vct-2025`: bib first author "Gopal, Sanskriti" — real first author is "Götting, Jasper". `ward2025-ctrl-alt-deceit`: 21 fabricated authors (real list has 9, mostly different people)
Fully hallucinated entry — paper doesn't exist	`betley2025-wmdp-sycophancy` titled "Sycophancy Hacks: Evaluators Gaming without Scheming" — no such paper
Fabricated venue — wrong workshop/conference name (e.g. "BioSafe GenAI" instead of "Biosecurity Safeguards for Generative AI")
Expanded handles / pseudonyms — source publishes under a handle, LLM "helpfully" replaces with a guessed real name	LessWrong post bylined `merizian`, `alexdzm`, `jacoba` — bib invented real names like "Alexandra Souly" for `alexdzm`. Even if you think you know who the handle belongs to, you don't have evidence the author wanted that attribution

Workflow

1. Automated check (catches ~80% of issues)

uv run ~/.claude/skills/check-bib-references/check_bib.py path/to/main.bib --only-mismatches

The script:

Parses BibTeX (regex-based, no bibtexparser dep)
Extracts arXiv IDs from eprint=, journal={arXiv preprint arXiv:...}, url=, howpublished=
Extracts OpenReview forum IDs from url=
Batches arXiv requests (avoids rate-limit on >5 entries)
Diffs bib title (Jaccard >=0.85) and first author lastname (LaTeX accents normalized)
Exit code 1 if any mismatches/errors

Flags:

--only-mismatches — quieter; suppresses OK/skip lines
--key <citekey> — check a single entry

2. Manual verification (the script can't catch these)

After the script is clean, still verify:

Check	How	Why
Full author list (script only checks first author)	`curl -sL https://arxiv.org/abs/<id> \| grep citation_author`	LLMs often fabricate middle authors. ward2025 had 21 mostly-fake authors past a real first author
Venue / workshop name in `booktitle=`	WebSearch the paper title + "accepted to"	LLMs invent plausible-sounding venue names
Skipped entries (no arxiv/openreview ID)	WebFetch the `url=` directly	Tech reports, blog posts — only direct verification works
Prose claims about the paper in `main.tex`	Read the paper abstract/intro via WebFetch or any2md	Citation can be real but the claim attached to it can be wrong (e.g. "X showed Y" when X showed Z)
Citation context in tex	`grep -n '\\cite[pt]\\?{<key>}' main.tex`	A correct bib entry attached to a wrong claim is still a fact-check failure

3. If sub-agent fact-checking is used

Source-of-truth APIs

Source	Endpoint	Notes
arXiv	`http://export.arxiv.org/api/query?id_list=2401.05566,2412.14093`	Atom XML. Batch via comma-separated `id_list` — single requests get IP-throttled at ~5 in a row. Rate limit: 1 req/3s, sticky penalty ~30-60s
OpenReview v2	`https://api2.openreview.net/notes?id=<forum_id>`	Use `id=` not `forum=` — `forum=` returns the "Paper Decision" note first
arXiv HTML (full author list)	`https://arxiv.org/abs/<id>` then `grep citation_author`	Use when full author list matters

Lessons from building/using this

Batch arXiv requests. 17 single-ID requests will get the IP throttled for ~60s. One batched id_list= request is fine.
Follow redirects. httpx doesn't by default; arXiv API redirects http→https.
Strip LaTeX accents before comparing authors. B{\"u}rger and Bürger are the same person. Regex: \\[\"'^~=.](?:{(\w)}|(\w))` → group(1)|group(2).
Token Jaccard at 0.85 is the sweet spot for title similarity — catches paraphrases without flagging legitimate punctuation/casing differences.
The script flags candidates; the human decides. A 0.80 score isn't necessarily wrong — a single missing word like "Detection" can drop you below 0.85. Read the diff.
OpenReview's forum= returns the wrong note. Use id=<forum_id> to get the actual paper.
others in author lists hides fabrication. The script can't tell Wijk and Chan and Korbak and others from Wijk and Chan and Korbak and FabricatedAuthor and others. Manually expand others for important refs.
Prose can lie even when the bib is right. A correct citation can support a wrong claim. The script doesn't read .tex — you must.
Preserve handles and pseudonyms verbatim. When the source byline is merizian, alexdzm, jacoba (LessWrong, Substack, blog posts, anonymous workshop submissions), cite them under those handles. Do NOT "expand" to a guessed real name — you don't have evidence the author wants that attribution, and you may be wrong. If the bib has a suspiciously full real name for a post you can only find under a handle, treat it as fabrication until proven otherwise.

Files

check_bib.py — the script (PEP 723 self-contained, uv run directly)

Adapting to other repos

The script is generic — point it at any .bib. Drop the file into the repo's scripts/ for per-project use, or always invoke from ~/.claude/skills/check-bib-references/check_bib.py.