双循环进化：内部反思(P0) + 外部吸收(P1)。Cross-project absorption methodology — multi-round cross-project comparison, active project tracking, self-expanding keyword discovery. 动灵驱动吸收(Entelechy-Driven Absorption v4.3).

2026-06-225

pdf-download-racing

yakeworld/Synthos

并行竞速PDF下载引擎 — curl_cffi TLS指纹绕过 + Sci-Hub域轮换 + LibGen + MedData。依赖 tools/paper-manager/src/。

2026-06-225

name	paper-pipeline
description	Complete paper pipeline: retrieval, extraction, quality review, analysis, and publication.
version	1.0.0
license	MIT
author	Synthos
metadata	{"synthos":{"atom_type":"composite","priority":"P0","signature":"paper_path: str -> analysis_report: dict","related_skills":[]}}

Paper Pipeline

Purpose

Composite skill that merges 35 paper-related skills into a unified pipeline.

Members (36)

adhd-eye-tracking-review: Directory index for adhd-eye-tracking-review: adhd-eye-tracking-review
arxiv: arXiv论文搜索 — 按关键词/作者/类别/ID检索。支持Tor SOCKS代理访问。
bib-integrity-audit: Audit .bib reference files across a paper library for:
biorxiv: Directory index for biorxiv: biorxiv
citation-bib-crossref: Scan paper directories for mismatches between \\cite{key} calls in .tex files and @type{key} entries in .bib files. Produces D8 (bib count) and D10a (match percentage) metrics, plus orphan/zombie classification.
citation-integrity-fix: ```python
emerging-field-landscape-scan: Skill: emerging-field-landscape-scan
gif-search: Search and download GIFs directly via the Tenor API using curl. No extra tools needed.
knowledge-base-audit: Audit and maintain personal knowledge management systems (AKNE, NotebookLM,
latex-output: Directory index for latex-output: latex-output
nano-pdf: Edit PDFs using natural-language instructions. Point it at a page and describe what to change.
nature-paper2ppt: Nature-style Chinese PPTX from academic papers — argument-driven slide
nsfc-grant-audit: Directory index for nsfc-grant-audit: nsfc-grant-audit
openalex: Directory index for openalex: openalex
paper-citation-health: Scan all papers in outputs/papers/ for citation bibliographic health metrics D8 (bib entries) and D10a (cite-to-bib match %).
paper-cron-scan: 路由到 v32-multi-direction-scan — 所有旋转扫描和白空间验证由此技能执行。独立 paper-cron-scan 技能已合并入 v32。
paper-pipeline: 主skill | SCI论文全流程编排器。v3.18.10新增Trap#42跨项目参考文献污染检测（Synthos Paper ID后缀/占位符键名/空条目/Prose提及无cite）。v3.18.9新增Trap#41 paper-queue.json幽灵条目逆方向。v3.18.5-8: D10a批量扫描+natbib盲区+注释过滤+路由修复。v3.18: Track A晋升协议。v3.16: 队列自愈+ABSOLUTE WHITE独立验证。v3.15: 轨道B四步工作流。
paper-quality-deep-review: 论文质量深度审查引擎 — 从文献下载→内容分析→研究空白验证→科学假设评估→解决方法评估→文献引用质量评分→综合评分。
paper-queue-audit: Directory index for paper-queue-audit: paper-queue-audit
research-queue-audit: Research queue audit and management — read/validate research-queue.json, check candidate state consistency, detect stale entries, sync state layers. Implementation lives in v32-multi-direction-scan (Steps 5-6 + pitfalls). This is a routing stub — the actual queue lifecycle protocol is in the v32 scan skill.
paper-references-scanning: Scan paper library for citation health: D8 (bib entry count), D10a (cite-to-bib match rate), orphans, zombies. Class of tasks: LaTeX reference integrity auditing.
pdf-download-racing: 并行竞速PDF下载引擎 — curl_cffi TLS指纹绕过 + Sci-Hub域轮换 + LibGen + MedData。依赖 tools/paper-manager/src/。
pdf-to-md-notebooklm: PDF→Markdown→NotebookLM 全流程管线。支持批量上传、自动类型检测、大文件处理。
pubmed: Deep PubMed/MEDLINE search via NCBI E-utilities — query construction, MeSH terms, batch retrieval, clinical query refinement.
quality: 质量保障 — 伪证验证、黄金测试、SCI论文质量评审。
quality-score-assignment: Paper satisfies current_step in steps_completed AND len(steps_completed) >= 8.
research: 直接调用子类别/技能名称即可。例如：arxiv、bib-integrity-audit、research-ideation。
research-ideation: 研究创意发散与认知引擎（RIF+CCF）。三层架构：Layer 1（10操作框架）→ 产出研究方向候选； Layer 2（8认知引擎）→
research-paper-search: 主skill | 多源论文检索+全文下载编排器。入口：Semantic Scholar (API Key), PubMed, OpenAlex, arXiv (Tor), Crossref。调用子skill: arxiv, pubmed, openalex。
research-skill-audit: Audit and enhance research skill coverage. Process for identifying gaps, testing existing skills, and creating/enhancing missing capabilities.
researcher-portrait: Directory index for researcher-portrait: researcher-portrait
sci-paper-quality-review: Directory index for sci-paper-quality-review: sci-paper-quality-review
sci-paper-standard-structure: Directory index for sci-paper-standard-structure: sci-paper-standard-structure
skill-integrity-audit: | 概念 | 文言 | 义 |
systematic-review: 系统综述与Meta分析工作流助手 — PRISMA流程、搜索策略设计、研究选择、质量评估、数据提取和综合支持。
v32-multi-direction-scan: Every cron run of autonomous-core-researcher after v31 API fix. Standardized pattern for scanning 5 rotation + 5 new directions per run.

🔴 研究方向约束（2026-06-22）

此约束同时作用于 autonomous-core-researcher、paper-repair、paper-quality-review、paper-layer-b-review、literature-monitor 等所有 cron 任务。

✅ 核心方向（全流程：论文生成→修复→评审→投稿）

瞳孔/虹膜分割 — 3d-eyeball-iris-segmentation, dual-ellipse 系列
眼球三维模型建模 — 3D pupil localization, Kappa角校准系列
半规管空间姿态 — SCC reconstruction, cupula deflection
BPPV虚拟仿真 — canalithiasis, Epley simulation 系列
VOR数字孪生 — VOR cancellation, digital twin, sparse modular
三维眼动算法组件 — 边缘检测、特征点提取、校准方法
公开眼动数据集分析/方法论审计 — PIMA/WDBC/Heart 等数据完整性审计
Synthos科研辅助系统 — 系统自身开发与进化
AI辅助教学 — 教学应用论文

🔴 外围方向（仅提取研究空白和科学假设，不推进论文）

角膜/晶状体/玻璃体生物力学、泪膜/睑板腺、耳鸣/脑震荡/脑干/吞咽障碍等非眼动非数据集驱动的生物学建模方向。

管线执行决策矩阵

场景	行为
自动扫描发现核心方向 gap	✅ 全流程：hypothesis → paper
自动扫描发现外围方向 gap	⛔ 只记录 gap + hypothesis，不进 paper queue
paper-repair 遇到外围论文	⛔ 跳过，不修复不报告
quality-review 遇到外围论文	⛔ 跳过
literature-monitor 发现外围文献	✅ 可记录至附录，不进主报告

⚡ Filesystem Layout (Dual-Filesystem Awareness)

Papers and pipeline state are spread across three locations. All cron agents MUST know all three:

Location	Contents	Purpose
`~/outputs/papers/`	Queue files (processed_papers.txt, low_score_papers.txt, no_state_papers.txt), bib-standards reports (`bib-standards-report-YYYY-MM-DD.md`)	Cron output reports, queue tracking
`~/桌面/article_todo/`	Actively developed papers (7 core direction papers — iris, pupil, SCC, BPPV — with submission materials)	Writing workspace. See `references/article-todo-inventory.md`
`/media/yakeworld/sda2/Synthos/outputs/papers/`	Main pipeline — 132 paper directories, `paper-queue.json`, `research-queue.json`, `_knowledge_only/` (21 research candidates), `state.json`, `submissions/`	Full paper pipeline + knowledge pipeline + evolution tracking

Critical distinction: Two separate queue files with different semantics:

paper-queue.json (132 papers) — full paper pipeline with quality scores, gate status, notes
_knowledge_only/research-queue.json (21 research candidates) — Track B knowledge pipeline (literature_scan → gap_analysis → hypothesis_generation → knowledge_entry)

Evolution tracking: Main evolution state at /media/yakeworld/sda2/Synthos/evolution-state.json (cycle 174+, EXCELLENT 0.9696 as of 2026-06-23). Legacy at /media/yakeworld/sda2/Synthos/outputs/evolution/evolution-state.json (cycle 64).

Agent log: /media/yakeworld/sda2/Synthos/outputs/papers/agent-log.md (cron execution history).

⚠️ Pitfall: Home ~/outputs/papers/ queue files (processed_papers.txt, etc.) reflect a subset and may be stale. Always read /media/yakeworld/sda2/Synthos/outputs/papers/paper-queue.json for authoritative state.

⚠️ Agent-Log Append-Only Protocol: agent-log.md is written by multiple cron jobs (autonomous-core-researcher, paper-repair, paper-layer-b-review, literature-monitor, etc.). NEVER overwrite it with write_file. Always use patch to append new entries after the last line. If accidentally overwritten, reconstruct from session_search (all pipeline cron sessions are stored in the session DB) and rewrite the combined file.

⚠️ D10a Verification Pitfalls

When verifying D10a (cite-to-bibitem match rate), four traps cause false positives/negatives:

Trap	Symptom	Fix
External .bib/.bbl	Grepping tex for `\bibitem` finds only template placeholders (`{label}`, `{lamport94}`)	Papers with `\bibliography{references}` use external bib → bibtex generates `.bbl` with real bibitems. Must grep the .bbl, not the .tex.
LaTeX comments	Template instructions like `%% Example citation, See \cite{lamport94}.` count as orphans	Skip all lines starting with `%` before extracting cites
Template markers	`<label>` in `\cite{<label>}` flags as orphan	Filter keys containing `<` or `>`
Stale reference_health	state.json `reference_health.D10a` disagrees with `d8_d10a_scan.d10a`	`d8_d10a_scan` is authoritative (updated by batch scan). `reference_health` may be stale pre-repair snapshot.
Stale .bbl from different bib source	D10a=0% despite inline thebibliography having correct keys. .bbl exists but was generated from a different .bib file with incompatible key naming (e.g., short keys in bbl vs long keys in tex cite commands). Script uses bbl (priority 1) → 0 matches.	Delete all stale `.bbl` files in the paper directory. Script will fall back to inline thebibliography or a fresh bibtex run. Always check: does the bbl's bibitem keys match the tex citation style? If key naming conventions differ, the bbl is from a different compilation era.
Missing .bib masquerading as .txt	`\bibliography{reference4}` causes BibTeX "I didn't find a database entry" for ALL cites, but `reference4.txt` exists with full content. The `.bib` extension is missing — BibTeX only reads `.bib` files.	Search for files with the same basename but `.txt` extension (e.g., `reference4.txt`, `06-references/reference4.txt`). Copy to `.bib` extension. Check: does the `.txt` file cover all cited keys? It may be from a different draft version and missing newer citations. After copying, run bibtex to identify remaining gaps.
Stale .bbl from older tex revision	D10a < 100% even though bib entries exist in the .bib file. The .bbl filename doesn't match the .tex filename (e.g., `revision20241117.bbl` but tex is `revision20241118v3.tex`). The old bbl predates newer citations added to the tex.	Delete the stale .bbl. Recompile: `pdflatex → bibtex → pdflatex×2`. Verify the new .bbl filename matches the tex basename.
Wrong .tex file selected (multi-tex directories)	D10a=0% or nonsensical results (e.g., 3 cites R1/R2/R3 with 30 bibitems). Scan may pick up a LaTeX template file (e.g., `Sage_LaTeX_Guidelines.tex`) before the real manuscript (`articlev2.tex` or `paper.tex`).	Check which tex was scanned. Look for `\begin{document}` and realistic citation keys (not R1/R2/R3 or `<label>`). Prefer the tex with the most `\cite{}` calls and `\begin{document}`.
article_todo workspace scanning	The main `d10a-batch-scan.py` targets `/media/yakeworld/sda2/Synthos/outputs/papers/` only. Papers in `~/桌面/article_todo/` need separate D10a checks.	Run a targeted scan on `~/桌面/article_todo/` using the same methodology: extract cites, find bib/bbl, compute D10a. The article_todo papers typically use .bbl-based references; stale .bbl is the #1 D10a issue here. See `references/article-todo-d10a-repair.md`.

Trusted methodology: Use scripts/d10a-batch-scan.py for all D10a verification on the main pipeline. For article_todo workspace papers, use the targeted scan approach documented in references/article-todo-d10a-repair.md. It handles both inline thebibliography and external .bbl workflows, excludes comments, and filters template artifacts. Shell-based grep approaches are fragile — always prefer the Python script.

Workflow for paper-repair cron:

python3 scripts/d10a-batch-scan.py --all --threshold 95 --base-dir /media/yakeworld/sda2/Synthos/outputs/papers
For each paper below threshold: read the tex, identify orphan cause (comment? template? missing bibitem? wrong bib source? stale bbl? missing .bib extension?)
Fix and re-verify (delete stale bbls first if present; check for .txt siblings of missing .bib files; fill missing entries; recompile pdflatex→bibtex→pdflatex×2)
Post-fix: run bibtex separately to catch entries missing from the .bib that were previously in the .bbl (old bbl may have had bibitems not in current bib file). BibTeX warnings reveal these silently-matched-before entries.
article_todo check: After pipeline scan, run a targeted D10a scan on ~/桌面/article_todo/. The most common issue in article_todo is stale .bbl from older revision. Fix: delete old .bbl, recompile. See paper-references-scanning/references/article-todo-d10a-check.md for the full targeted scan methodology (created 2026-06-22).

IO_CONTRACT

input: paper_path: str, analysis_type: str — Paper path and analysis type
output: analysis_report: dict — Complete analysis report

对应原则：P3（人机分层 — 路由器负责路由，原子负责执行）