تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

bibref-verify

Name: Bibref Verify
Author: yzhao062

// Use when auditing a paper's existing .bib for hallucinated citations or stale metadata before submission, especially when bib entries came from co-authors or automated tools and may include fabricated authors, future arXiv IDs, or preprints already published in peer-reviewed venues. Companion to bibref-filler (which adds new citations); this skill audits existing ones.

تشغيل في Manus

$ git log --oneline --stat

stars:٢

forks:١

updated:١٨ مايو ٢٠٢٦ في ٠٦:٠٥

مستكشف الملفات

3 ملفات

SKILL.md

readonly

package.json

"author": "yzhao062"

"repository": "yzhao062/agent-pack"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

أساتذة علوم الحاسوب في التعليم العاليالتعليم والمكتبات25-1021L4

تشغيل أي مهارة بنقرة واحدة

name	bibref-verify
description	Use when auditing a paper's existing .bib for hallucinated citations or stale metadata before submission, especially when bib entries came from co-authors or automated tools and may include fabricated authors, future arXiv IDs, or preprints already published in peer-reviewed venues. Companion to bibref-filler (which adds new citations); this skill audits existing ones.

bibref-verify

Overview

Audits an existing .bib for hallucinated citations and stale metadata. Output: a single REFERENCE-CHECK.md with audit findings and ready-to-paste BibTeX fixes. Read-only on the .bib file; the user applies fixes manually.

Distinct scope from bibref-filler (a separate shared skill, typically bootstrapped to .agent-config/repo/skills/ rather than tracked in the project repo): bibref-filler adds new citations to prose during drafting; bibref-verify audits existing entries before submission. The two skills do not overlap.

When to Use

Paper or proposal is approaching submission for a venue that desk-rejects on hallucinated citations (NeurIPS, ICML, ICLR, KDD, NSF, NIH, etc.)
The .bib contains entries the user did not personally add (co-author contributions, automated bib generators, AI assistants)
Suspected failure modes: arXiv IDs with future YYMM, BibTeX keys that look like raw arXiv IDs, papers cited as arxiv preprint that may have been published since
The user wants a third-party verification trail for reviewer scrutiny

Skip when: the paper is in early draft and most citations are placeholder; the bib has fewer than 10 entries (manual review is faster); the user only needs to ADD new citations (use bibref-filler if available).

Pipeline (one-shot, 4 passes, ~15-20 min)

Pass	Method	Output
A. Distributed web-search	`ceil(N/24)` parallel agents (max 8); each verifies a contiguous chunk of the `.bib` via WebSearch + WebFetch	per-agent verdict tables, returned as Agent tool results
B. S2 cross-verification	Resolve the helper relative to this `SKILL.md`: set `<skill-dir>` to the directory containing this file (for example, `skills/bibref-verify` in a source checkout or `.claude/skills/bibref-verify` after pack deployment), then run `<python> <skill-dir>/scripts/verify-bib-s2.py <bib-path>`, where `<python>` is the project interpreter (on Windows, prefer the Miniforge / conda interpreter such as `C:/Users/<u>/miniforge3/envs/py312/python.exe`; do not rely on the Windows Store `python` alias). Reads `S2_API_KEY` from env or `.env`; arXiv batch + title-search fallback at sim ≥ 0.85	`<bib-dir>/S2-VERIFY-REPORT.md`
C. Codex review	Write the consolidated draft to `<bib-dir>/REFERENCE-CHECK.md`, then invoke the `implement-review` skill, Round 1, with the report path as the artifact under review	`Review-Codex.md` at repo root
D. Targeted web-fetch (conditional)	If Codex Round 1 raises pushbacks, dispatch one parallel verification agent per disputed claim, fetching the canonical source (OpenReview, journal page, arXiv abstract)	per-claim verdict (CONFIRMED / REFUTED / INCONCLUSIVE)

After all passes finish, consolidate into the final <bib-dir>/REFERENCE-CHECK.md, overwriting the draft.

Order note: The pipeline order above is the normalized recommended workflow. The validation history described under "Real-world impact" ran passes in a different order because Pass B (S2) was added mid-session; this canonical order is cleaner because Pass C (Codex) sees both web-search and S2 evidence consolidated in a single artifact.

Pass A details

Per-agent prompt must include:

The exact .bib file path and a contiguous line range to verify
Red flags: future-YYMM arXiv IDs (e.g., 2611.xxxxx after May 2026); duplicate arXiv IDs across multiple keys with different author lists; year ≥ current with no clear venue; 25+ author lists on non-consortium papers; (in preparation) placeholders
Output format: markdown table with one row per entry, columns Key | Verdict | Evidence | Notes where Verdict is one of VERIFIED / DRIFT / REFUTED / INCONCLUSIVE
Hard constraint: do NOT modify any file (read-only audit)

Dispatch all agents in a single message with multiple Agent tool calls so they run concurrently.

Pass B known false positives

S2 has weak indexing for several categories. Treat S2 NOT-FOUND on these as INCONCLUSIVE, not HALLUCINATION:

Books and pre-arXiv-era papers (Hebb 1949, Shannon 1948, Cover 2006, Newman 2010, Lovasz 1993, etc.). Title-search picks the wrong paper, producing AUTHOR-MISMATCH or YEAR-MISMATCH.
Transformer Circuits Thread web posts (elhage2021mathematical, Bricken2023Monosemanticity, Templeton2024ScalingMonosemanticity, ameisen2025circuit). S2 does not index TCT.
Diacritics encoding mismatches — bib \'i vs S2 í causes substring-match failure even when the names refer to the same person (e.g., garc\\ia-perez vs garcía-pérez).
Version-year mismatches — S2 sometimes returns the arXiv-preprint year rather than the final journal or proceedings year. The bib year is usually correct for the published version; do not auto-flag as DRIFT without web verification.
Short or generic-title matches — for older or broad-topic works (e.g., classical theorems cited by short title), S2 title-search may pick a related-but-wrong paper. Verify against the canonical publisher page before classifying as DRIFT or REFUTED.
GitHub issue and pull-request citations — bib keys that point to a GitHub-repository issue or PR (typical patterns: <repo>-<number>, <owner-repo>-<number>, or a url field linking to github.com/<owner>/<repo>/{issues,pull}/<n>) cite real artifacts that S2 does not index. Common in benchmark and robustness papers that ground each subtype to a documented production failure. Verify via direct WebFetch of the issue URL.
Vendor product or SDK documentation pages — model-spec, API-reference, and product-doc citations (e.g., model spec pages on a vendor's docs/ site, SDK reference pages, public roadmap entries) are not S2-indexable. Verify via WebFetch of the URL itself.

Verify these via Pass A's web-search results or by direct WebFetch of the publication page.

Pass C handoff

Pass C uses the existing implement-review skill. The artifact under review is <bib-dir>/REFERENCE-CHECK.md (the draft consolidation from Passes A + B), not the .bib itself. Codex's role is sanity-check, not redo-the-audit.

The Pass-C prompt should ask Codex to:

Independently verify each REFUTED and DRIFT classification via WebFetch of the canonical source
Spot-check 5-10 entries from the VERIFIED bucket for false negatives
Take an explicit position on the recommended fix actions (the scope-challenge contract from implement-review)

Pass D triggers (conditional)

Pass D runs only if Codex Round 1 raises factual pushbacks (REFUTED on a finding the audit had marked DRIFT or VERIFIED). For each disputed claim, dispatch one verification agent in parallel:

Fetch the canonical source URL Codex cited
Compare bib metadata vs canonical source field by field
Return CONFIRMED (Codex correct), REFUTED (Codex wrong), or INCONCLUSIVE

Bundle all verification agents into one message of parallel Agent tool calls. One agent per claim, not bundled.

Consolidation: REFERENCE-CHECK.md format

Single Markdown file at <bib-dir>/REFERENCE-CHECK.md. Sections in this order:

Header — date, methods table (4 passes), source bib path, status line ("check only, no edits to the .bib file")
Global summary — severity counts table (HALLUCINATION / DRIFT-venue / DRIFT-metadata / INCONCLUSIVE / VERIFIED) with totals that sum to the bib entry count
Red — HALLUCINATION — must-fix entries with real arXiv IDs but fabricated authors. One row per entry: Key (line) | Real authors | bib's authors | Action. Include a ready-to-paste corrected @type{} block for each.
Yellow — DRIFT (venue upgrade) — arXiv preprints already published. One row per entry: Key | Now published in (venue, year, vol/issue/article-id, DOI). Include corrected entry block.
Yellow — DRIFT (metadata error) — year, pages, co-author list, or BibTeX field-layout errors. Include corrected entry block.
Gray — INCONCLUSIVE — textbooks S2 cannot index, (in preparation) placeholders. No fix action; user discretion.
Retracted claims (only if any) — Pass-A findings overturned by Passes B/C/D, kept for traceability.
Side findings — duplicate keys; same paper with two different keys; cite-vs-bib gap. Out of audit scope but worth flagging.
Verification artifacts — paths to S2-VERIFY-REPORT.md, Review-Codex.md, the script. Plus the exact command to re-run.

For every non-VERIFIED entry, include a ready-to-paste fix as a fenced BibTeX block. Example:

@inproceedings{huben2024sparse,
  title={Sparse Autoencoders Find Highly Interpretable Features in Language Models},
  author={Huben, Robert and Cunningham, Hoagy and Smith, Logan Riggs and Ewart, Aidan and Sharkey, Lee},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024}
}

Required infrastructure

<skill-dir>/scripts/verify-bib-s2.py, where <skill-dir> is the directory containing this SKILL.md (skills/bibref-verify in a source checkout, .claude/skills/bibref-verify after pack deployment); uses Python stdlib only, no third-party deps
S2_API_KEY in .env at the repo root (or as a user-level env var)
Codex availability for Pass C (terminal channel or IDE plugin)
WebSearch and WebFetch tools available to parallel agents

Graceful degradation:

No S2_API_KEY → skip Pass B, warn the user, the audit is meaningfully degraded but still useful
Codex unavailable → skip Passes C + D, warn the user, run only A + B
Pass D never auto-runs if Codex Round 1 has no pushbacks

Output location

Default: <bib-dir>/REFERENCE-CHECK.md (always emitted) and <bib-dir>/S2-VERIFY-REPORT.md (emitted only when Pass B runs; absent in S2-degraded mode).

When Pass B or Pass C is skipped because of missing infrastructure, the consolidated REFERENCE-CHECK.md must mark the audit as degraded in its status line (for example, Status: check only, no edits to .bib; S2-degraded — no S2_API_KEY). This prevents downstream readers from treating a partial audit as a full one.

For paper submodules (Overleaf-bridged), this places the reports inside the submodule directory. The submodule's git status will show the files as untracked. The user decides whether to commit and push to Overleaf or keep local-only.

Common mistakes

Mistake	Fix
Skipping Pass B because "Pass A already covered web-search"	S2 catches venue-upgrade misses Pass A consistently misses (~5/144 in observed cases). The two are independent sources, not redundant.
Treating S2 NOT-FOUND on books / pre-arXiv papers as a hallucination signal	Whitelist these categories (see Pass B section). S2 does not index them well.
Modifying the `.bib` during the audit	The skill is audit-only by design. All fixes are user-applied via copy-paste from `REFERENCE-CHECK.md`.
Sequential web-search instead of parallel agents in Pass A	Serial Pass A takes 30+ minutes for a 100-entry bib. Always dispatch in parallel.
Sending the whole `.bib` to Codex in Pass C instead of the consolidated `REFERENCE-CHECK.md`	Codex's role is to sanity-check the audit, not redo it. Send the report path.
Auto-committing the report to a paper submodule	The report is local-by-default. Some collaborators do not want the audit visible in Overleaf history; let the user decide.
Hardcoding `python` as the interpreter on Windows	The Windows Store stub for `python` may not work. Use the user's miniforge / conda Python explicitly, e.g., `C:/Users/<u>/miniforge3/envs/py312/python.exe`.

Security notes

S2_API_KEY is account-level read-only access to public papers; leakage risk is low (someone could spend the user's rate limit budget) but the key still belongs in .env, not in the repo
The .bib file is never modified by the skill; the report is the only artifact
Reports placed inside paper submodules push to Overleaf if the user commits them — flag this before commit

Real-world impact

Case A: 144-entry academic-paper bib (typical NeurIPS-style submission):

Pass A flagged 4 hallucinated entries (real arXiv IDs with fabricated author lists) and 9 metadata DRIFTs; 2 of those Pass-A claims were later retracted by Codex after independent web verification
Pass B caught 5 venue-upgrade misses Pass A had not detected (arXiv preprints already accepted at peer-reviewed venues)
Pass C confirmed convergence after resolving S2's expected false positives via targeted WebFetch
Pass D (1 round, 5 parallel agents) verified all 5 of Codex's disputed claims
Final: 4 HALLUCINATION + 16 DRIFT + 2 INCONCLUSIVE + 122 VERIFIED

Case B: 55-entry bib with heavy non-academic citations (~33% GitHub issues / vendor-doc URLs):

Pass B returned 18 NOT-FOUND, all confirmed false positives via Pass A (GitHub issues + vendor docs are not S2-indexable)
Pass A flagged 1 HALLUCINATION where the bib's author list shared only 2 of 9 names with the real published paper's author list, plus 8 DRIFTs across issue-paraphrase mischaracterization, venue/title drift, missing arXiv ID, and editorial paraphrase
Pass C confirmed all classifications and caught report-internal arithmetic errors; no factual pushback so Pass D did not trigger
Final: 1 HALLUCINATION + 8 DRIFT + 0 INCONCLUSIVE + 46 VERIFIED. Demonstrates the skill scales to bibs with heavy non-academic citations as long as Pass B false positives are correctly classified.

name	bibref-verify
description	Use when auditing a paper's existing .bib for hallucinated citations or stale metadata before submission, especially when bib entries came from co-authors or automated tools and may include fabricated authors, future arXiv IDs, or preprints already published in peer-reviewed venues. Companion to bibref-filler (which adds new citations); this skill audits existing ones.

bibref-verify

Overview

When to Use

Paper or proposal is approaching submission for a venue that desk-rejects on hallucinated citations (NeurIPS, ICML, ICLR, KDD, NSF, NIH, etc.)
The .bib contains entries the user did not personally add (co-author contributions, automated bib generators, AI assistants)
Suspected failure modes: arXiv IDs with future YYMM, BibTeX keys that look like raw arXiv IDs, papers cited as arxiv preprint that may have been published since
The user wants a third-party verification trail for reviewer scrutiny

Pipeline (one-shot, 4 passes, ~15-20 min)

Pass	Method	Output
A. Distributed web-search	`ceil(N/24)` parallel agents (max 8); each verifies a contiguous chunk of the `.bib` via WebSearch + WebFetch	per-agent verdict tables, returned as Agent tool results
B. S2 cross-verification	Resolve the helper relative to this `SKILL.md`: set `<skill-dir>` to the directory containing this file (for example, `skills/bibref-verify` in a source checkout or `.claude/skills/bibref-verify` after pack deployment), then run `<python> <skill-dir>/scripts/verify-bib-s2.py <bib-path>`, where `<python>` is the project interpreter (on Windows, prefer the Miniforge / conda interpreter such as `C:/Users/<u>/miniforge3/envs/py312/python.exe`; do not rely on the Windows Store `python` alias). Reads `S2_API_KEY` from env or `.env`; arXiv batch + title-search fallback at sim ≥ 0.85	`<bib-dir>/S2-VERIFY-REPORT.md`
C. Codex review	Write the consolidated draft to `<bib-dir>/REFERENCE-CHECK.md`, then invoke the `implement-review` skill, Round 1, with the report path as the artifact under review	`Review-Codex.md` at repo root
D. Targeted web-fetch (conditional)	If Codex Round 1 raises pushbacks, dispatch one parallel verification agent per disputed claim, fetching the canonical source (OpenReview, journal page, arXiv abstract)	per-claim verdict (CONFIRMED / REFUTED / INCONCLUSIVE)

After all passes finish, consolidate into the final <bib-dir>/REFERENCE-CHECK.md, overwriting the draft.

Pass A details

Per-agent prompt must include:

The exact .bib file path and a contiguous line range to verify
Red flags: future-YYMM arXiv IDs (e.g., 2611.xxxxx after May 2026); duplicate arXiv IDs across multiple keys with different author lists; year ≥ current with no clear venue; 25+ author lists on non-consortium papers; (in preparation) placeholders
Output format: markdown table with one row per entry, columns Key | Verdict | Evidence | Notes where Verdict is one of VERIFIED / DRIFT / REFUTED / INCONCLUSIVE
Hard constraint: do NOT modify any file (read-only audit)

Dispatch all agents in a single message with multiple Agent tool calls so they run concurrently.

Pass B known false positives

S2 has weak indexing for several categories. Treat S2 NOT-FOUND on these as INCONCLUSIVE, not HALLUCINATION:

Books and pre-arXiv-era papers (Hebb 1949, Shannon 1948, Cover 2006, Newman 2010, Lovasz 1993, etc.). Title-search picks the wrong paper, producing AUTHOR-MISMATCH or YEAR-MISMATCH.
Transformer Circuits Thread web posts (elhage2021mathematical, Bricken2023Monosemanticity, Templeton2024ScalingMonosemanticity, ameisen2025circuit). S2 does not index TCT.
Diacritics encoding mismatches — bib \'i vs S2 í causes substring-match failure even when the names refer to the same person (e.g., garc\\ia-perez vs garcía-pérez).
Version-year mismatches — S2 sometimes returns the arXiv-preprint year rather than the final journal or proceedings year. The bib year is usually correct for the published version; do not auto-flag as DRIFT without web verification.
Short or generic-title matches — for older or broad-topic works (e.g., classical theorems cited by short title), S2 title-search may pick a related-but-wrong paper. Verify against the canonical publisher page before classifying as DRIFT or REFUTED.
GitHub issue and pull-request citations — bib keys that point to a GitHub-repository issue or PR (typical patterns: <repo>-<number>, <owner-repo>-<number>, or a url field linking to github.com/<owner>/<repo>/{issues,pull}/<n>) cite real artifacts that S2 does not index. Common in benchmark and robustness papers that ground each subtype to a documented production failure. Verify via direct WebFetch of the issue URL.
Vendor product or SDK documentation pages — model-spec, API-reference, and product-doc citations (e.g., model spec pages on a vendor's docs/ site, SDK reference pages, public roadmap entries) are not S2-indexable. Verify via WebFetch of the URL itself.

Verify these via Pass A's web-search results or by direct WebFetch of the publication page.

Pass C handoff

The Pass-C prompt should ask Codex to:

Independently verify each REFUTED and DRIFT classification via WebFetch of the canonical source
Spot-check 5-10 entries from the VERIFIED bucket for false negatives
Take an explicit position on the recommended fix actions (the scope-challenge contract from implement-review)

Pass D triggers (conditional)

Pass D runs only if Codex Round 1 raises factual pushbacks (REFUTED on a finding the audit had marked DRIFT or VERIFIED). For each disputed claim, dispatch one verification agent in parallel:

Fetch the canonical source URL Codex cited
Compare bib metadata vs canonical source field by field
Return CONFIRMED (Codex correct), REFUTED (Codex wrong), or INCONCLUSIVE

Bundle all verification agents into one message of parallel Agent tool calls. One agent per claim, not bundled.

Consolidation: REFERENCE-CHECK.md format

Single Markdown file at <bib-dir>/REFERENCE-CHECK.md. Sections in this order:

Header — date, methods table (4 passes), source bib path, status line ("check only, no edits to the .bib file")
Global summary — severity counts table (HALLUCINATION / DRIFT-venue / DRIFT-metadata / INCONCLUSIVE / VERIFIED) with totals that sum to the bib entry count
Red — HALLUCINATION — must-fix entries with real arXiv IDs but fabricated authors. One row per entry: Key (line) | Real authors | bib's authors | Action. Include a ready-to-paste corrected @type{} block for each.
Yellow — DRIFT (venue upgrade) — arXiv preprints already published. One row per entry: Key | Now published in (venue, year, vol/issue/article-id, DOI). Include corrected entry block.
Yellow — DRIFT (metadata error) — year, pages, co-author list, or BibTeX field-layout errors. Include corrected entry block.
Gray — INCONCLUSIVE — textbooks S2 cannot index, (in preparation) placeholders. No fix action; user discretion.
Retracted claims (only if any) — Pass-A findings overturned by Passes B/C/D, kept for traceability.
Side findings — duplicate keys; same paper with two different keys; cite-vs-bib gap. Out of audit scope but worth flagging.
Verification artifacts — paths to S2-VERIFY-REPORT.md, Review-Codex.md, the script. Plus the exact command to re-run.

For every non-VERIFIED entry, include a ready-to-paste fix as a fenced BibTeX block. Example:

@inproceedings{huben2024sparse,
  title={Sparse Autoencoders Find Highly Interpretable Features in Language Models},
  author={Huben, Robert and Cunningham, Hoagy and Smith, Logan Riggs and Ewart, Aidan and Sharkey, Lee},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024}
}

Required infrastructure

<skill-dir>/scripts/verify-bib-s2.py, where <skill-dir> is the directory containing this SKILL.md (skills/bibref-verify in a source checkout, .claude/skills/bibref-verify after pack deployment); uses Python stdlib only, no third-party deps
S2_API_KEY in .env at the repo root (or as a user-level env var)
Codex availability for Pass C (terminal channel or IDE plugin)
WebSearch and WebFetch tools available to parallel agents

Graceful degradation:

No S2_API_KEY → skip Pass B, warn the user, the audit is meaningfully degraded but still useful
Codex unavailable → skip Passes C + D, warn the user, run only A + B
Pass D never auto-runs if Codex Round 1 has no pushbacks

Output location

Default: <bib-dir>/REFERENCE-CHECK.md (always emitted) and <bib-dir>/S2-VERIFY-REPORT.md (emitted only when Pass B runs; absent in S2-degraded mode).

Common mistakes

Mistake	Fix
Skipping Pass B because "Pass A already covered web-search"	S2 catches venue-upgrade misses Pass A consistently misses (~5/144 in observed cases). The two are independent sources, not redundant.
Treating S2 NOT-FOUND on books / pre-arXiv papers as a hallucination signal	Whitelist these categories (see Pass B section). S2 does not index them well.
Modifying the `.bib` during the audit	The skill is audit-only by design. All fixes are user-applied via copy-paste from `REFERENCE-CHECK.md`.
Sequential web-search instead of parallel agents in Pass A	Serial Pass A takes 30+ minutes for a 100-entry bib. Always dispatch in parallel.
Sending the whole `.bib` to Codex in Pass C instead of the consolidated `REFERENCE-CHECK.md`	Codex's role is to sanity-check the audit, not redo it. Send the report path.
Auto-committing the report to a paper submodule	The report is local-by-default. Some collaborators do not want the audit visible in Overleaf history; let the user decide.
Hardcoding `python` as the interpreter on Windows	The Windows Store stub for `python` may not work. Use the user's miniforge / conda Python explicitly, e.g., `C:/Users/<u>/miniforge3/envs/py312/python.exe`.

Security notes

S2_API_KEY is account-level read-only access to public papers; leakage risk is low (someone could spend the user's rate limit budget) but the key still belongs in .env, not in the repo
The .bib file is never modified by the skill; the report is the only artifact
Reports placed inside paper submodules push to Overleaf if the user commits them — flag this before commit

Real-world impact

Case A: 144-entry academic-paper bib (typical NeurIPS-style submission):

Pass A flagged 4 hallucinated entries (real arXiv IDs with fabricated author lists) and 9 metadata DRIFTs; 2 of those Pass-A claims were later retracted by Codex after independent web verification
Pass B caught 5 venue-upgrade misses Pass A had not detected (arXiv preprints already accepted at peer-reviewed venues)
Pass C confirmed convergence after resolving S2's expected false positives via targeted WebFetch
Pass D (1 round, 5 parallel agents) verified all 5 of Codex's disputed claims
Final: 4 HALLUCINATION + 16 DRIFT + 2 INCONCLUSIVE + 122 VERIFIED

Case B: 55-entry bib with heavy non-academic citations (~33% GitHub issues / vendor-doc URLs):

Pass B returned 18 NOT-FOUND, all confirmed false positives via Pass A (GitHub issues + vendor docs are not S2-indexable)
Pass A flagged 1 HALLUCINATION where the bib's author list shared only 2 of 9 names with the real published paper's author list, plus 8 DRIFTs across issue-paraphrase mischaracterization, venue/title drift, missing arXiv ID, and editorial paraphrase
Pass C confirmed all classifications and caught report-internal arithmetic errors; no factual pushback so Pass D did not trigger
Final: 1 HALLUCINATION + 8 DRIFT + 0 INCONCLUSIVE + 46 VERIFIED. Demonstrates the skill scales to bibs with heavy non-academic citations as long as Pass B false positives are correctly classified.