원클릭으로
paper-pipeline
Complete paper pipeline: retrieval, extraction, quality review, analysis, and publication.
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
Complete paper pipeline: retrieval, extraction, quality review, analysis, and publication.
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
| name | paper-pipeline |
| description | Complete paper pipeline: retrieval, extraction, quality review, analysis, and publication. |
| version | 1.0.0 |
| license | MIT |
| author | Synthos |
| metadata | {"synthos":{"atom_type":"composite","priority":"P0","signature":"paper_path: str -> analysis_report: dict","related_skills":[]}} |
Composite skill that merges 35 paper-related skills into a unified pipeline.
.bib reference files across a paper library for:\\cite{key} calls in .tex files and @type{key} entries in .bib files. Produces D8 (bib count) and D10a (match percentage) metrics, plus orphan/zombie classification.outputs/papers/ for citation bibliographic health metrics D8 (bib entries) and D10a (cite-to-bib match %).v32-multi-direction-scan — 所有旋转扫描和白空间验证由此技能执行。独立 paper-cron-scan 技能已合并入 v32。v32-multi-direction-scan (Steps 5-6 + pitfalls). This is a routing stub — the actual queue lifecycle protocol is in the v32 scan skill.current_step in steps_completed AND len(steps_completed) >= 8.arxiv、bib-integrity-audit、research-ideation。此约束同时作用于 autonomous-core-researcher、paper-repair、paper-quality-review、paper-layer-b-review、literature-monitor 等所有 cron 任务。
角膜/晶状体/玻璃体生物力学、泪膜/睑板腺、耳鸣/脑震荡/脑干/吞咽障碍等非眼动非数据集驱动的生物学建模方向。
| 场景 | 行为 |
|---|---|
| 自动扫描发现核心方向 gap | ✅ 全流程:hypothesis → paper |
| 自动扫描发现外围方向 gap | ⛔ 只记录 gap + hypothesis,不进 paper queue |
| paper-repair 遇到外围论文 | ⛔ 跳过,不修复不报告 |
| quality-review 遇到外围论文 | ⛔ 跳过 |
| literature-monitor 发现外围文献 | ✅ 可记录至附录,不进主报告 |
Papers and pipeline state are spread across three locations. All cron agents MUST know all three:
| Location | Contents | Purpose |
|---|---|---|
~/outputs/papers/ | Queue files (processed_papers.txt, low_score_papers.txt, no_state_papers.txt), bib-standards reports (bib-standards-report-YYYY-MM-DD.md) | Cron output reports, queue tracking |
~/桌面/article_todo/ | Actively developed papers (7 core direction papers — iris, pupil, SCC, BPPV — with submission materials) | Writing workspace. See references/article-todo-inventory.md |
/media/yakeworld/sda2/Synthos/outputs/papers/ | Main pipeline — 132 paper directories, paper-queue.json, research-queue.json, _knowledge_only/ (21 research candidates), state.json, submissions/ | Full paper pipeline + knowledge pipeline + evolution tracking |
Critical distinction: Two separate queue files with different semantics:
paper-queue.json (132 papers) — full paper pipeline with quality scores, gate status, notes_knowledge_only/research-queue.json (21 research candidates) — Track B knowledge pipeline (literature_scan → gap_analysis → hypothesis_generation → knowledge_entry)Evolution tracking: Main evolution state at /media/yakeworld/sda2/Synthos/evolution-state.json (cycle 174+, EXCELLENT 0.9696 as of 2026-06-23). Legacy at /media/yakeworld/sda2/Synthos/outputs/evolution/evolution-state.json (cycle 64).
Agent log: /media/yakeworld/sda2/Synthos/outputs/papers/agent-log.md (cron execution history).
⚠️ Pitfall: Home
~/outputs/papers/queue files (processed_papers.txt, etc.) reflect a subset and may be stale. Always read/media/yakeworld/sda2/Synthos/outputs/papers/paper-queue.jsonfor authoritative state.
⚠️ Agent-Log Append-Only Protocol:
agent-log.mdis written by multiple cron jobs (autonomous-core-researcher, paper-repair, paper-layer-b-review, literature-monitor, etc.). NEVER overwrite it with write_file. Always use patch to append new entries after the last line. If accidentally overwritten, reconstruct from session_search (all pipeline cron sessions are stored in the session DB) and rewrite the combined file.
When verifying D10a (cite-to-bibitem match rate), four traps cause false positives/negatives:
| Trap | Symptom | Fix |
|---|---|---|
| External .bib/.bbl | Grepping tex for \bibitem finds only template placeholders ({label}, {lamport94}) | Papers with \bibliography{references} use external bib → bibtex generates .bbl with real bibitems. Must grep the .bbl, not the .tex. |
| LaTeX comments | Template instructions like %% Example citation, See \cite{lamport94}. count as orphans | Skip all lines starting with % before extracting cites |
| Template markers | <label> in \cite{<label>} flags as orphan | Filter keys containing < or > |
| Stale reference_health | state.json reference_health.D10a disagrees with d8_d10a_scan.d10a | d8_d10a_scan is authoritative (updated by batch scan). reference_health may be stale pre-repair snapshot. |
| Stale .bbl from different bib source | D10a=0% despite inline thebibliography having correct keys. .bbl exists but was generated from a different .bib file with incompatible key naming (e.g., short keys in bbl vs long keys in tex cite commands). Script uses bbl (priority 1) → 0 matches. | Delete all stale .bbl files in the paper directory. Script will fall back to inline thebibliography or a fresh bibtex run. Always check: does the bbl's bibitem keys match the tex citation style? If key naming conventions differ, the bbl is from a different compilation era. |
| Missing .bib masquerading as .txt | \bibliography{reference4} causes BibTeX "I didn't find a database entry" for ALL cites, but reference4.txt exists with full content. The .bib extension is missing — BibTeX only reads .bib files. | Search for files with the same basename but .txt extension (e.g., reference4.txt, 06-references/reference4.txt). Copy to .bib extension. Check: does the .txt file cover all cited keys? It may be from a different draft version and missing newer citations. After copying, run bibtex to identify remaining gaps. |
| Stale .bbl from older tex revision | D10a < 100% even though bib entries exist in the .bib file. The .bbl filename doesn't match the .tex filename (e.g., revision20241117.bbl but tex is revision20241118v3.tex). The old bbl predates newer citations added to the tex. | Delete the stale .bbl. Recompile: pdflatex → bibtex → pdflatex×2. Verify the new .bbl filename matches the tex basename. |
| Wrong .tex file selected (multi-tex directories) | D10a=0% or nonsensical results (e.g., 3 cites R1/R2/R3 with 30 bibitems). Scan may pick up a LaTeX template file (e.g., Sage_LaTeX_Guidelines.tex) before the real manuscript (articlev2.tex or paper.tex). | Check which tex was scanned. Look for \begin{document} and realistic citation keys (not R1/R2/R3 or <label>). Prefer the tex with the most \cite{} calls and \begin{document}. |
| article_todo workspace scanning | The main d10a-batch-scan.py targets /media/yakeworld/sda2/Synthos/outputs/papers/ only. Papers in ~/桌面/article_todo/ need separate D10a checks. | Run a targeted scan on ~/桌面/article_todo/ using the same methodology: extract cites, find bib/bbl, compute D10a. The article_todo papers typically use .bbl-based references; stale .bbl is the #1 D10a issue here. See references/article-todo-d10a-repair.md. |
Trusted methodology: Use scripts/d10a-batch-scan.py for all D10a verification on the main pipeline. For article_todo workspace papers, use the targeted scan approach documented in references/article-todo-d10a-repair.md. It handles both inline thebibliography and external .bbl workflows, excludes comments, and filters template artifacts. Shell-based grep approaches are fragile — always prefer the Python script.
Workflow for paper-repair cron:
python3 scripts/d10a-batch-scan.py --all --threshold 95 --base-dir /media/yakeworld/sda2/Synthos/outputs/papers.txt siblings of missing .bib files; fill missing entries; recompile pdflatex→bibtex→pdflatex×2).bib that were previously in the .bbl (old bbl may have had bibitems not in current bib file). BibTeX warnings reveal these silently-matched-before entries.~/桌面/article_todo/. The most common issue in article_todo is stale .bbl from older revision. Fix: delete old .bbl, recompile. See paper-references-scanning/references/article-todo-d10a-check.md for the full targeted scan methodology (created 2026-06-22).paper_path: str, analysis_type: str — Paper path and analysis typeanalysis_report: dict — Complete analysis report对应原则:P3(人机分层 — 路由器负责路由,原子负责执行)
引用三验 — 参考文献是否存在(L1) + 引用是否得当(L2) + 引用是否全面(L3)。三位一体验证管线,从DOI验真到语义审查到遗漏检测。
**触发条件**: 对一批论文(10-34 篇)批量处理 `step_quality_check.md` 中的 quality_score 并写入 `state.json`。
子skill | NotebookLM CLI全功能指南 — Q&A知识提取、内容生成(报告/视频/音频/信息图/幻灯片)、文献检索。响应paper-pipeline的P1阶段调用。
生产力工具 — Airtable、Google Workspace、Linear、Notion、Jupyter等。
双循环进化:内部反思(P0) + 外部吸收(P1)。Cross-project absorption methodology — multi-round cross-project comparison, active project tracking, self-expanding keyword discovery. 动灵驱动吸收(Entelechy-Driven Absorption v4.3).
并行竞速PDF下载引擎 — curl_cffi TLS指纹绕过 + Sci-Hub域轮换 + LibGen + MedData。依赖 tools/paper-manager/src/。