with one click
research-units-pipeline-skills
research-units-pipeline-skills contains 108 collected skills from WILLOSCAR, with repository-level occupation coverage and site-owned skill detail pages.
Skills in this repository
Download a small corpus of open-access arXiv survey/review PDFs about agentic systems and extract text for style learning. **Trigger**: agent survey corpus, ref corpus, download surveys, 学习综述写法, 下载 survey. **Use when**: you want to study how real agent surveys structure sections (6–8 H2), size subsections, and write evidence-backed comparisons. **Skip if**: you cannot download PDFs (no network) or you don't want local PDF files. **Network**: required. **Guardrail**: only download arXiv PDFs; store under `ref/` and keep large files out of git.
Global consistency review for survey drafts: terminology, cross-section coherence, and scope/citation hygiene. Writes `output/GLOBAL_REVIEW.md` and (optionally) applies safe edits to `output/DRAFT.md`. **Trigger**: global review, consistency check, coherence audit, 术语一致性, 全局回看, 章节呼应, 拷打 writer. **Use when**: Draft exists and you want a final evidence-first coherence pass before LaTeX/PDF. **Skip if**: You are still changing the outline/mapping/notes (do those first), or prose writing is not approved. **Network**: none. **Guardrail**: Do not invent facts or citations; do not add new citation keys; treat missing evidence as a failure signal.
Multi-route literature expansion + metadata normalization for evidence-first surveys. Produces a large candidate pool (`papers/papers_raw.jsonl`, target ≥1200) with stable IDs and provenance, ready for dedupe/rank + citation generation. **Trigger**: evidence collector, literature engineer, 文献扩充, 多路召回, snowballing, cited by, references, 元信息增强, provenance. **Use when**: 需要把候选文献扩充到 ≥1200 篇并补齐可追溯 meta(survey pipeline 的 Stage C1,写作前置 evidence)。 **Skip if**: 已经有高质量 `papers/papers_raw.jsonl`(≥1200 且每条都有稳定标识+来源记录)。 **Network**: 可离线(靠 imports);雪崩/在线检索需要网络。 **Guardrail**: 不允许编造论文;每条记录必须带稳定标识(arXiv id / DOI / 可信 URL)和 provenance;不写 output/ prose。
Download PDFs (when available) and extract plain text to support full-text evidence, writing `papers/fulltext_index.jsonl` and `papers/fulltext/*.txt`. **Trigger**: PDF download, fulltext, extract text, papers/pdfs, 全文抽取, 下载PDF. **Use when**: `queries.md` 设置 `evidence_mode: fulltext`(或你明确需要全文证据)并希望为 paper notes/claims 提供更强 evidence。 **Skip if**: `evidence_mode: abstract`(默认);或你不希望进行下载/抽取(成本/权限/时间)。 **Network**: fulltext 下载通常需要网络(除非你手工提供 PDF 缓存在 `papers/pdfs/`)。 **Guardrail**: 缓存下载到 `papers/pdfs/`;默认不覆盖已有抽取文本(除非显式要求重抽)。
Write `output/DRAFT.md` (or `output/SNAPSHOT.md`) from an approved outline and evidence packs, using only verified citation keys from `citations/ref.bib`. **Trigger**: write draft, prose writer, snapshot, survey writing, 写综述, 生成草稿, section-by-section drafting. **Use when**: structure is approved (`DECISIONS.md` has `Approve C2`) and evidence packs exist (`outline/subsection_briefs.jsonl`, `outline/evidence_drafts.jsonl`). **Skip if**: approvals are missing, or evidence packs are incomplete / scaffolded (missing-fields, TODO markers). **Network**: none. **Guardrail**: do not invent facts or citations; only cite keys present in `citations/ref.bib`; avoid pipeline-jargon leakage in final prose.
Normalize cross-skill JSONL interfaces (ids + titles + citation key formats) so downstream skills do not rely on best-effort joins. **Trigger**: schema normalize, jsonl contract, interface drift, join drift, 字段不一致, schema 规范化. **Use when**: you have generated C2-C4 JSONL artifacts (outline/briefs/bindings/packs/anchors) and want deterministic, stable fields before self-loops/writing. **Skip if**: you are not using the survey pipelines, or the workspace already has a fresh PASS `output/SCHEMA_NORMALIZATION_REPORT.md` for the current artifacts. **Network**: none. **Guardrail**: NO PROSE; deterministic transforms only; do not invent evidence/claims; only fill missing ids/titles from `outline/outline.yml`.
Writing self-loop for surveys: run the strict section-quality gate, then rewrite only the failing `sections/*.md` files until the report is PASS. **Trigger**: writer self-loop, writing loop, quality gate loop, rewrite failing sections, 自循环, 反复改到 PASS. **Use when**: per-section files exist but C5 is FAIL/BLOCKED (thin sections, missing leads/front matter, citation-scope violations, generator voice). **Skip if**: you are still pre-C2 (NO PROSE), or evidence packs are incomplete (fix C3/C4 first). **Network**: none. **Guardrail**: do not invent facts; only use citation keys present in `citations/ref.bib`; keep citations in-scope per `outline/evidence_bindings.jsonl`; do not add/remove citation keys during rewrites.
Build a retrieval-informed chapter skeleton (`outline/chapter_skeleton.yml`) from taxonomy/core scope before stable H3 decomposition. **Trigger**: chapter skeleton, chapter-level outline, H2 skeleton, section-first survey, 章节骨架, 章级骨架. **Use when**: survey structure should stabilize chapter-level intent before subsection mapping and writing cards. **Skip if**: `outline/chapter_skeleton.yml` already exists and is refined. **Network**: none. **Guardrail**: NO PROSE; do not invent papers; keep output chapter-level only.
Audit and rewrite evaluation/numeric claims to ensure they carry minimal protocol context (task + metric + constraint) and avoid underspecified model naming. **Trigger**: evaluation anchor checker, numeric claim hygiene, underspecified numbers, protocol context, 评测锚点检查, 数字断言, 指标上下文. **Use when**: before final merge/polish, or when reviewers would likely flag claims as underspecified (numbers without task/metric/budget), or `pipeline-auditor` warns about suspicious model naming. **Skip if**: evidence is too thin to justify numeric claims (route upstream to C3/C4), or you are pre-C2 (NO PROSE). **Network**: none. **Guardrail**: do not invent numbers; do not add/remove/move citation keys; if protocol context is missing, weaken/remove the numeric claim rather than guessing.
Bind papers to chapter-level sections first, writing `outline/section_bindings.jsonl` and `outline/section_binding_report.md`. **Trigger**: section bindings, chapter bindings, section-first binding, 章节绑定, 章级绑定. **Use when**: survey structure should measure chapter saturation before stable H3 decomposition. **Skip if**: chapter skeleton is missing or the bindings are already refined. **Network**: none. **Guardrail**: NO PROSE; do not invent papers; produce auditable PASS/BLOCKED/REROUTE signals.
Build chapter-level briefs (`outline/section_briefs.jsonl`) from chapter skeleton plus section bindings before stable H3 decomposition. **Trigger**: section briefs, chapter planning cards, section-first briefs, 章节 brief, 章级 brief. **Use when**: section bindings exist and the run needs chapter-level rationale and decomposition guidance before emitting stable H3 ids. **Skip if**: `outline/section_briefs.jsonl` already exists and is refined. **Network**: none. **Guardrail**: NO PROSE; do not invent papers; emit planning constraints, not reader-facing text.
Use when a reader-facing deliverable exists and needs a deterministic PASS/FAIL quality gate. **Trigger**: self loop, self-loop, polish deliverable, quality gate, fix-on-fail, 收敛, 自循环, 质量门. **Use when**: A pipeline has produced a reader-facing deliverable (`output/*.md`) and you want deterministic convergence to PASS. **Skip if**: You are still pre-approval for prose or the upstream evidence/structure artifacts are missing. **Network**: none. **Guardrail**: Do not invent papers/citations/results. Only use in-scope inputs already present in the workspace.
Select the most appropriate pipeline for a user goal, lock it in `PIPELINE.lock.md`, and route checkpoint questions into `DECISIONS.md`. **Trigger**: pipeline router, choose pipeline, workflow selection, PIPELINE.lock.md, 选择流程. **Use when**: 用户目标/交付物不清晰,需要在 research-brief/paper-review/evidence-review/survey/tutorial/idea-brainstorm 中选一个并设置最小 HITL 问题集。 **Skip if**: pipeline 已锁定(`PIPELINE.lock.md` 存在)且所需问题已回答/签字完成。 **Network**: none. **Guardrail**: 尽量一次性提问;信息不足就写 `DECISIONS.md` 并停下等待。
Run this repo’s Units+Checkpoints research pipelines end-to-end (survey/brief/paper-review/evidence-review/idea/tutorial/graduate-paper), with workspaces + checkpoints. **Trigger**: run pipeline, kickoff, 继续执行, 自动跑, 写一篇, survey/brief/review/调研/教程/系统综述/审稿. **Use when**: 用户希望端到端跑流程(创建 `workspaces/<name>/`、生成/执行 `UNITS.csv`、遇到 HUMAN checkpoint 停下等待)。 **Skip if**: 用户明确要手工逐条执行(用 `unit-executor`),或你不应自动推进到 prose 阶段。 **Network**: depends on selected pipeline (arXiv/PDF/citation verification may need network; offline import supported where available). **Guardrail**: 必须尊重 checkpoints(无 Approve 不写 prose);遇到 HUMAN 单元必须停下等待;禁止在 repo root 创建 workspace 工件。
Use when an approved tutorial spec exists and the run needs a deterministic prerequisite graph before module planning. **Trigger**: concept graph, prerequisite graph, dependency graph, 概念图, 先修关系. **Use when**: `source-tutorial` 的 C2,已有 `output/TUTORIAL_SPEC.md`,需要把教程概念转成可排序的 DAG。 **Skip if**: 还没有 tutorial spec。 **Network**: none. **Guardrail**: 只做结构,不写 reader-facing prose;图必须保持无环。
Use when a tutorial module plan exists but each module still needs a verifiable teaching loop. **Trigger**: exercises, practice, verification checklist, 教程练习, 可验证作业. **Use when**: tutorial 的 C2,已有 `outline/module_plan.yml`,需要为每个模块补齐 exercise / expected output / verification steps。 **Skip if**: 还没有 module plan。 **Network**: none. **Guardrail**: 练习必须可验证,不能只给开放式思考题。
Use when a tutorial concept DAG exists and the run needs an ordered teaching plan with objectives and outputs. **Trigger**: module plan, tutorial modules, course outline, 模块规划, module_plan.yml. **Use when**: tutorial 的 C2,已有 `outline/concept_graph.yml`,需要把 concept DAG 收敛成模块顺序。 **Skip if**: 还没有 concept graph。 **Network**: none. **Guardrail**: 每个模块都要有 objectives、outputs、running-example step;不要写长 prose。
Use when a tutorial module plan exists and the run needs an auditable module-to-source grounding file before prose. **Trigger**: module coverage, source coverage, tutorial grounding, 模块覆盖, 来源覆盖. **Use when**: `source-tutorial` 的 C2,已有 `outline/module_plan.yml`,需要确认每个模块都能回指到 sources。 **Skip if**: module plan 或 source ingest 不完整。 **Network**: none. **Guardrail**: 只做 grounding audit,不写教程正文。
Use when a `source-tutorial` workspace has ingested sources and needs a grounded tutorial contract before structure planning. **Trigger**: source tutorial spec, tutorial from sources, learner profile, 教程规格, 从资料生成教程. **Use when**: `source-tutorial` 的 C2,需要根据 `sources/index.jsonl` / `sources/provenance.jsonl` 锁定 audience、prerequisites、learning objectives、source scope 和 running example policy。 **Skip if**: source ingest 还没完成,或 tutorial scope 已被人工冻结。 **Network**: none. **Guardrail**: 不要发明 sources 没支持的内容;running example 不稳时要明确写无统一 running example。
Use when approved tutorial context packs exist and the run needs the final article-first tutorial deliverable. **Trigger**: source tutorial writer, tutorial drafting, 教程正文, 从资料写教程. **Use when**: `source-tutorial` 的 C3,`outline/tutorial_context_packs.jsonl` 已就绪,且 `DECISIONS.md` 已勾选 `Approve C2`。 **Skip if**: C2 未批准,或 context packs 还没准备好。 **Network**: none. **Guardrail**: 正文必须 reader-first,但不能写出 sources 没支持的内容。
Use when module planning and source coverage are done and the run needs writer-ready per-module packs. **Trigger**: tutorial context pack, module pack, writer pack, 教程上下文包, 模块写作包. **Use when**: `source-tutorial` 的 C2,已有 module plan + source coverage,需要组织成稳定写作输入。 **Skip if**: module/source coverage 还没完成。 **Network**: none. **Guardrail**: 只整理上下文,不直接写教程正文。
Use when `evidence-review` has an extraction table and needs lightweight risk-of-bias fields. **Trigger**: bias, risk-of-bias, RoB, evidence quality, 偏倚评估, 证据质量. **Use when**: `evidence-review` 已生成 `papers/extraction_table.csv`,需要在 synthesis 前补齐偏倚/质量字段。 **Skip if**: 不是 evidence/systematic review,或还没有 `papers/extraction_table.csv`。 **Network**: none. **Guardrail**: 使用简单可复核刻度(low/unclear/high)+ 简短 notes;保持字段一致性。
Use when a review workspace has manuscript text and needs a traceable claim ledger. **Trigger**: claims extractor, extract claims, contributions, assumptions, peer review, 审稿, 主张提取. **Use when**: 审稿/评审或 evidence audit,需要把主张列表落盘并可追溯到原文位置(section/page/quote)。 **Skip if**: 没有可用的稿件/全文(例如缺少 `output/PAPER.md` 或等价文本)。 **Network**: none. **Guardrail**: 每条 claim 必须带可定位的 source pointer;区分 empirical vs conceptual claims。
Use when a broad paper candidate pool needs deterministic deduplication and a stable core set. **Trigger**: dedupe, rank, core set, 去重, 排序, 精选论文, 核心集合. **Use when**: 检索后需要把广覆盖集合收敛成可管理的 core set(用于 taxonomy/outline/mapping)。 **Skip if**: 已经有人手工整理了稳定的 `papers/core_set.csv`(无需再次 churn)。 **Network**: none. **Guardrail**: 偏 deterministic;输出应可重复(稳定 paper_id、字段规范)。
Use when `paper-review` needs a claim-by-claim evidence gap report grounded in an extracted claim ledger. **Trigger**: evidence audit, missing evidence, unsupported claims, 审稿证据审计, 证据缺口. **Use when**: `paper-review` 流程中,需要逐条检查 claim 的证据链、缺 baseline、评测薄弱点。 **Skip if**: 缺少 claims 输入(例如还没有 `output/CLAIMS.md`)。 **Network**: none. **Guardrail**: 只写“缺口/风险/下一步验证”,不要替作者补写论述或引入新主张。
Use when `evidence-review` has screened includes and needs a schema-aligned extraction table. **Trigger**: extraction form, extraction table, data extraction, 信息提取, 提取表. **Use when**: `evidence-review` 在 screening 后进入 extraction(C4),需要把纳入论文按字段落到 CSV 以支持后续 synthesis。 **Skip if**: 还没有 `papers/screening_log.csv` 或 protocol 未锁定。 **Network**: none. **Guardrail**: 严格按 schema 填字段;不要在此阶段写 narrative synthesis(那是 `synthesis-writer`)。
Use when `paper-review` needs a canonical manuscript text artifact before claim extraction. **Trigger**: ingest paper, manuscript text, provide paper, paper.md, 输入论文, 导入稿件, 审稿输入. **Use when**: You are running the `paper-review` pipeline and need `output/PAPER.md` before `claims-extractor`. **Skip if**: `output/PAPER.md` already exists and looks like the full manuscript text. **Network**: none. **Guardrail**: Do not summarize or rewrite the paper; store the raw text (or a faithful extraction) so claims stay traceable.
Use when `paper-review` needs overlap/delta positioning against provided related work. **Trigger**: novelty matrix, prior-work matrix, overlap/delta, 相关工作对比, 新颖性矩阵. **Use when**: `paper-review` 中评估 novelty/positioning,需要把贡献与相关工作逐项对齐并写出差异点证据。 **Skip if**: 缺少 claims(先跑 `claims-extractor`)或你不打算做新颖性定位分析。 **Network**: none (retrieval of additional related work is out-of-scope unless provided). **Guardrail**: 明确 overlap 与 delta;尽量给出可追溯证据来源(来自稿件/引用/作者陈述)。
Use when `evidence-review` needs an operational protocol before screening and extraction. **Trigger**: protocol, PRISMA, systematic review, inclusion/exclusion, 检索式, 纳入排除. **Use when**: `evidence-review` pipeline 的起点(C1),需要先锁定 protocol 再开始 screening/extraction。 **Skip if**: 不是做 evidence/systematic review(或 protocol 已经锁定且不允许修改)。 **Network**: none. **Guardrail**: protocol 必须包含可执行的检索与筛选规则;需要 HUMAN 签字后才能进入 screening。
Use when `paper-review` has claims plus evidence gaps and needs the final referee-style report. **Trigger**: rubric review, referee report, peer review write-up, 审稿报告, REVIEW.md. **Use when**: `paper-review` pipeline 的最后阶段(C3),已有 `output/CLAIMS.md` + `output/MISSING_EVIDENCE.md`(以及可选 novelty matrix)。 **Skip if**: 上游产物未就绪(claims/evidence gaps 缺失)或你不打算输出完整审稿报告。 **Network**: none. **Guardrail**: 给可执行建议(actionable feedback),并覆盖 novelty/soundness/clarity/impact;避免泛泛而谈。
Use when an approved `evidence-review` protocol needs to be applied to a candidate pool. **Trigger**: screening, title/abstract screening, inclusion/exclusion, screening_log.csv, 文献筛选, 纳入排除. **Use when**: `evidence-review` 的 screening 阶段(C2/C3),protocol 已锁定并通过 HUMAN 审批。 **Skip if**: 还没有 `output/PROTOCOL.md`(或 protocol 未通过签字)。 **Network**: none. **Guardrail**: 每条记录包含决策与理由;保持可审计(不要把“未读/不确定”当作纳入)。
Use when a `research-brief` workspace has a small paper set plus outline and needs a compact reader-facing briefing instead of a full survey. **Trigger**: snapshot, literature snapshot, 速览, 48h snapshot, one-page snapshot, SNAPSHOT.md. **Use when**: 你要在 `research-brief` 流程里 24-48h 内交付一个“可读的研究速览”(bullet-first,含关键引用),而不是完整 survey。 **Skip if**: 你已经进入 evidence-first survey 写作(有 `outline/evidence_drafts.jsonl` / `citations/ref.bib` / `output/DRAFT.md`),应改用 `subsection-writer`/`prose-writer`。 **Network**: none. **Guardrail**: 不发明论文/引用;引用只来自 `papers/core_set.csv`(或同 workspace 的候选池);不写长段落(避免“像综述生成器”)。
Use when `evidence-review` has completed extraction and needs a bounded narrative synthesis. **Trigger**: synthesis, evidence synthesis, systematic review writing, 综合写作, SYNTHESIS.md. **Use when**: `evidence-review` 完成 screening+extraction(含 bias 评估)后进入写作阶段(C5)。 **Skip if**: 还没有 `papers/extraction_table.csv`(或 protocol/screening 尚未完成)。 **Network**: none. **Guardrail**: 以 extraction table 为证据底座;明确局限性与偏倚;不要在无数据支撑时扩写结论。
Record a human sign-off at a declared checkpoint (tick `Approve C*` in `DECISIONS.md`) so the pipeline can resume. **Trigger**: approve checkpoint, human approval, sign off, HITL, Approve C2, 审批, 签字, 人类检查点. **Use when**: A unit has `owner=HUMAN` and is BLOCKED waiting for a checkbox in `DECISIONS.md`. **Skip if**: The approval is already recorded (the checkbox is ticked). **Network**: none. **Guardrail**: Do not modify any content artifacts; only update `DECISIONS.md` (and optionally append a short sign-off note).
Compile the Beamer tutorial deck and write a build report. **Trigger**: beamer compile, slides compile, tutorial slides pdf, 编译幻灯片, beamer pdf. **Use when**: `source-tutorial` 的 C4,已有 `latex/slides/main.tex`,需要输出 `latex/slides/main.pdf` 和编译报告。 **Skip if**: slides scaffold 还没完成。 **Network**: none. **Guardrail**: 编译失败也要落盘可读报告,不能只返回报错。
Generate a Beamer slide deck from the final tutorial and approved module structure. **Trigger**: beamer scaffold, slides from tutorial, tutorial slides, 生成 beamer, 教程幻灯片. **Use when**: `source-tutorial` 的 C4,需要把 `output/TUTORIAL.md` 转成可编译的 `latex/slides/main.tex`。 **Skip if**: 还没有 tutorial 正文。 **Network**: none. **Guardrail**: slides 不能只是机械 heading dump;必须保持模块对齐并适合讲授/轻量自学。
Scaffold a LaTeX project (`latex/main.tex`, optional bibliography wiring, structure) from an existing Markdown draft. **Trigger**: latex scaffold, md→tex, LaTeX 项目骨架, 生成 main.tex. **Use when**: 需要 LaTeX/PDF 交付(例如 arxiv-survey-latex pipeline),且 draft 已生成/已进入写作阶段。 **Skip if**: 还没有 `output/DRAFT.md`(或你不需要 LaTeX 交付)。 **Network**: none. **Guardrail**: 移除 markdown 残留(`##`, `**`, `[@...]`);bibliography 指向 `citations/ref.bib`;不在此步骤改写内容。
Fetch and normalize supported source-tutorial inputs into local, traceable text artifacts. **Trigger**: source ingest, ingest sources, normalize tutorial sources, 网页抽取, 资料归一化. **Use when**: `source-tutorial` 的 C1,需要把 `sources/manifest.yml` 中的网页/PDF/repo/docs 变成可追溯文本。 **Skip if**: source manifest 还没定,或来源尚未确认。 **Network**: required for remote URLs. **Guardrail**: 只把成功抽取的内容当作有效 source;失败来源必须落盘记录,不能默默忽略。
Build or validate the source manifest for the source-tutorial pipeline from user-provided URLs/files. **Trigger**: source manifest, sources list, tutorial sources, url list, 资料清单, 教程来源. **Use when**: `source-tutorial` 的 C1,需要把多源输入落成统一的 `sources/manifest.yml`,并在内容不完整时显式阻塞。 **Skip if**: 已经有完整且经过确认的 `sources/manifest.yml`。 **Network**: none. **Guardrail**: 不要伪造来源;manifest 不完整时应返回 BLOCKED,而不是假装完成。
Run a tutorial-specific quality gate and write a PASS/FAIL report for the final tutorial deliverable. **Trigger**: tutorial self-loop, tutorial quality gate, tutorial pass/fail, 教程自循环, 教程质量门. **Use when**: `source-tutorial` 的 C3,已经有 `output/TUTORIAL.md`,想在交付前确认它满足教程合同而不是普通长文。 **Skip if**: 还没有 tutorial 正文。 **Network**: none. **Guardrail**: 报告缺口时不要发明内容;把失败清楚地路由回 tutorial 写作阶段。