بنقرة واحدة
bib-integrity-audit
Audit `.bib` reference files across a paper library for:
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
القائمة
Audit `.bib` reference files across a paper library for:
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
引用三验 — 参考文献是否存在(L1) + 引用是否得当(L2) + 引用是否全面(L3)。三位一体验证管线,从DOI验真到语义审查到遗漏检测。
**触发条件**: 对一批论文(10-34 篇)批量处理 `step_quality_check.md` 中的 quality_score 并写入 `state.json`。
子skill | NotebookLM CLI全功能指南 — Q&A知识提取、内容生成(报告/视频/音频/信息图/幻灯片)、文献检索。响应paper-pipeline的P1阶段调用。
生产力工具 — Airtable、Google Workspace、Linear、Notion、Jupyter等。
Complete paper pipeline: retrieval, extraction, quality review, analysis, and publication.
双循环进化:内部反思(P0) + 外部吸收(P1)。Cross-project absorption methodology — multi-round cross-project comparison, active project tracking, self-expanding keyword discovery. 动灵驱动吸收(Entelechy-Driven Absorption v4.3).
| name | bib-integrity-audit |
| description | Audit `.bib` reference files across a paper library for: |
| version | 1.0.0 |
| license | MIT |
| author | Synthos |
| metadata | {"synthos":{"signature":"task_desc: str, params: dict -> result: dict","atom_type":"skill","priority":"P2","related_skills":[]}} |
bib_file: str — 用户请求描述、上下文信息audit_report: dict — Bib完整性审计对应原则:P2(机械原子暴露输入输出规范)
Audit .bib reference files across a paper library for:
.bib filesfind <paper-root> -name '*.bib' -type f
.bib files, not just 06-references/references.bibreference4.bib, referencefinal.bib, reference3.bib, ref.bib, ref_orig.bibgrep -c '^@[A-Za-z]' filedoi = {...} fieldCheck each entry against these signals:
| Signal | Pattern | Example |
|---|---|---|
@misc with auto-generated key | auto\d{4} in key | @misc{auto2024...} |
| Kaggle publisher | publisher = {...kaggle...} | publisher={Kaggle} |
| URL as year field | year = {https?://...} | year={http://biometrics.idealtest.org/} |
| arXiv preprint without arXiv ID | journal = {arXiv preprint but no arXiv:XXXX | journal={arXiv preprint} without ID |
Incomplete @misc | missing author or title | dataset citations without proper fields |
| No author field | no author = { in entry | orphan entries |
| No title/booktitle field | no title = { or booktitle = { in entry | |
| Empty DOI | doi = {} | |
| Duplicate key across files | same key in multiple .bib files | swirski2013fully in 3+ files |
For each unique entry key across ALL bib files:
For entries with complete metadata but missing DOI, use known DOI database:
| Key | Known DOI | Source Type |
|---|---|---|
daugman2009iris | 10.1016/b978-0-12-374457-9.00025-1 | Springer chapter |
proencca2009ubiris | 10.1109/TPAMI.2009.66 | IEEE TPAMI |
lu2022neural | 10.1109/ISMAR55827.2022.00053 | IEEE ISMAR |
dierkes2018novel | 10.1145/3281417.3281423 | ACM ETRA |
tsukada2011illumination | 10.1109/ICCVW.2011.6139507 | IEEE ICCVW |
For entries with complete metadata but unknown DOI, use OpenAlex API for DOI lookup:
https://api.openalex.org/works?search={title}&select=title,doi,author_institutions,institutions
Generate a markdown report with:
ref.bib vs ref_orig.bib, referencefinal.bib vs reference4.bib). These share the same entry keys — the audit will flag them as "cross-file duplicates" but they are not inconsistent, they are redundant copies. Treat the version with fewer entries as the likely cleaned copy; the longer one is usually the raw export. Recommend consolidating to a single source of truth per paper.find -name '*.bib' may miss files in non-standard paths投稿文件final/, latexnew/)find with path walk, don't assume 06-references/ structure') cause cd, glob expansion (*.bib), and ls *.bib to silently fail. When shell commands fail on a known directory, switch to Python os.listdir() for traversal. Avoid cd into unicode-named dirs entirely; always use absolute paths from Python or find -print | while read patterns.journal = {arXiv preprint arXiv:XXXX} (has ID, no DOI)eprint = {XXXX} + archivePrefix = {arXiv} (proper BibTeX format)doi = {10.48550/arXiv.XXXXX} (has DOI already)@misc with URL-as-year or missing author/title@dataset entry type or properly formatted @misc with URL and accessed date@Comment{jabref-meta:...} entries| Entry type | DOI expected | Verification source |
|---|---|---|
| Journal articles (IEEE, Elsevier, Springer) | Yes — always | Crossref API (high success rate) |
| Conference proceedings (ACM ETRA, IEEE, Springer LNCS) | Yes — always | Semantic Scholar → Crossref fallback |
| arXiv preprints | No DOI until published | Keep arXiv ID in eprint or journal field |
| Datasets (CASIA, UBIRIS, OpenEDS, Kaggle) | No DOI | Keep as @misc with URL, no DOI expected |
| Book chapters | Yes | Crossref API (may require full book ISBN for lookup) |
| Technical reports | Sometimes | Crossref, but not always indexed |
For entries with complete metadata but missing DOI, use a tiered approach:
Semantic Scholar first (most reliable for modern papers):
GET https://api.semanticscholar.org/graph/v1/paper/search
?query={title}&limit=3&fields=title,authors,year,externalIds
Header: User-Agent: synthos-audit/1.0 (yakeworld@wmu.edu.cn)
externalIds.DOI and externalIds.SEMANTIC_SCHOLARCrossref API (for journal articles and book chapters):
GET https://api.crossref.org/works
?query.title={title}&rows=3&mailto=yakeworld@wmu.edu.cn
doi = {10.xxxx/yyyy} — clean \\_ to _ before lookupmailto: header, 5 req/sec maxPubMed (for medical/clinical papers):
GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
?db=pubmed&term={title}&retmax=5
Fallback: If no API source returns a match, the entry may be:
10.48550/arXiv.2408.17231) are VALID DOIs but NOT in Crossref database. Crossref returns 404. Keep them as-is in the .bib file.10.1007/978-3-031-37660-38) may return 404 if the book/chapter is not yet indexed or if there's a digit error. Verify by searching Semantic Scholar or Google Scholar first.10.1145/XXXXX.XXXXXX — always start with 10.114510.1109/XXXXX.YYYYYYY or 10.1109/ACCESS.XXXXXXX10.1016/j.xxxx.yyyy.zzdoi = {value} (multiple spaces) vs doi = {value} (single space). A naive doi = { regex will miss entries with 2+ spaces. Use doi\s*=\s*\{ or grep -c 'doi' for counting instead.lu2022neural vs Lu2022Neural3G for the same ISMAR paper{\\c{c}}), entry titles may parse as N/A or garbled in simple grep-based parsers. Use Python for robust multi-file dedup comparison — shell awk/grep can silently drop or corrupt unicode content in titles.(.+?) fails across bracesr'^@([A-Za-z]+)\s*\{\s*([^,]+?),\s*(.*?)(?=\n@[A-Za-z]+\s*\{|$)'
\s* after ,\s* consumes newlines, so (.+?) starts from the first field..*? is non-greedy — the } inside title={...} can satisfy the lookahead prematurely. The body is truncated after the first field.r'^@([A-Za-z]+)\s*\{\s*([^,]+?),\s*([\s\S]*?)(?=\n\s*\}\s*\n|\n\s*\}\s*$)'
[\s\S]*? matches ANY character including newlines.} on its own line (BibTeX entry terminator), not @.r'(\w+)\s*=\s*(?:\{([^}]*)\}|"[^"]*")' without re.DOTALL misses multi-line author={Name and Name}. Use line-by-line parsing with re.DOTALL, or add bare-value third alternative (.+).🧹 Bib标准化报告 (YYYY-MM-DD)
| 论文 | 条目数 | DOI覆盖率 | 可疑条目 | 已补DOI |
|:-----|:------:|:--------:|:--------:|:-------:|
| pima-crispdm | 33 | 94% | 0 | 0 |
可疑条目明细:
- paper-xyz: Key2024 (journal: arXiv preprint 无ID)
已补DOI明细:
- paper-abc: Key2020 → 10.XXXX/...
06-references/references.bib — audit finds what exists10.48550/arXiv DOI is explicitly presentreferences/bib-suspicious-patterns.md — Detailed catalog of suspicious entry patterns with examplesreferences/session-report-2026-06-06.md — Session report for cross-file dedup workflowreferences/session-report-2026-06-12.md — Session report for DOI completion workflowreferences/synthos-known-dois.md — Pre-verified DOI mappings for known Synthos paper references (updated with new DOIs and entry type classification)references/api-lookup-workflow.md — Complete API workflow for DOI lookup: Semantic Scholar, Crossref, PubMed endpoints, parameters, error handling, DOI patternsreferences/doi-patterns.md — DOI pattern reference guide: publisher patterns, common issues (escaped underscores, arXiv in Crossref, Springer digit errors), classification logicscripts/bib-audit-v2.py — Automated audit script: scans .bib files, computes DOI coverage, detects suspicious entries, cross-file deduplicates, OpenAlex DOI lookup, markdown report outputscripts/bib-audit.py — Original audit script (legacy)scripts/bib-verify.py — DOI verification script: verifies existing DOIs via Crossref, classifies entries by type (journal/conference/dataset/preprint), generates verification report