Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

schema-normalizer

Name: Schema Normalizer
Author: WILLOSCAR

// Normalize cross-skill JSONL interfaces (ids + titles + citation key formats) so downstream skills do not rely on best-effort joins. **Trigger**: schema normalize, jsonl contract, interface drift, join drift, 字段不一致, schema 规范化. **Use when**: you have generated C2-C4 JSONL artifacts (outline/briefs/bindings/packs/anchors) and want deterministic, stable fields before self-loops/writing. **Skip if**: you are not using the survey pipelines, or the workspace already has a fresh PASS `output/SCHEMA_NORMALIZATION_REPORT.md` for the current artifacts. **Network**: none. **Guardrail**: NO PROSE; deterministic transforms only; do not invent evidence/claims; only fill missing ids/titles from `outline/outline.yml`.

Ejecutar en Manus

$ git log --oneline --stat

stars:449

forks:31

updated:30 de mayo de 2026, 12:16

Explorador de archivos

2 archivos

SKILL.md

readonly

name

schema-normalizer

description

Normalize cross-skill JSONL interfaces (ids + titles + citation key formats) so downstream skills do not rely on best-effort joins. **Trigger**: schema normalize, jsonl contract, interface drift, join drift, 字段不一致, schema 规范化. **Use when**: you have generated C2-C4 JSONL artifacts (outline/briefs/bindings/packs/anchors) and want deterministic, stable fields before self-loops/writing. **Skip if**: you are not using the survey pipelines, or the workspace already has a fresh PASS `output/SCHEMA_NORMALIZATION_REPORT.md` for the current artifacts. **Network**: none. **Guardrail**: NO PROSE; deterministic transforms only; do not invent evidence/claims; only fill missing ids/titles from `outline/outline.yml`.

Schema Normalizer (NO PROSE)

Purpose: close a common failure mode in skills-first pipelines: schema drift across JSONL artifacts.

When fields are inconsistent (missing ids/titles, mixed citation-key formats), downstream skills start doing best-effort joins and fragile parsing. This skill makes the interface explicit and deterministic.

Inputs

outline/outline.yml (source of truth for section/subsection ids + titles)
Optional (for citation-key sanity): citations/ref.bib
Default JSONL artifacts to normalize (arxiv-survey(-latex) C4 bridge):
- outline/subsection_briefs.jsonl
- outline/chapter_briefs.jsonl
- outline/evidence_bindings.jsonl
- outline/evidence_drafts.jsonl
- outline/anchor_sheet.jsonl
Optional (run after writer packs are generated):
- outline/writer_context_packs.jsonl

Outputs

output/SCHEMA_NORMALIZATION_REPORT.md (always written; PASS/FAIL + what changed)
The processed JSONL files are normalized in place (a .bak.* is created if changes are applied).

What gets normalized

1) IDs + titles (join keys)

For any record with sub_id: "<H2>.<H3>":

Ensure section_id exists (derived from the prefix before the dot)
Ensure title, section_title exist (filled from outline/outline.yml)

For any record with section_id: "<H2>":

Ensure section_title exists (filled from outline/outline.yml)

2) Citation key format (reduce parsing drift)

Within these C2-C4 JSONL artifacts, normalize citation keys so they are raw BibTeX keys (no @ prefix):

"citations": ["smith2023", "jones2024"]

Notes:

Final prose still uses Markdown citations: [@smith2023].
This skill does not add/remove citations; it only normalizes formatting.

When to run

Recommended placement in arxiv-survey(-latex):

Run after evidence-draft + anchor-sheet and before writer-context-pack + evidence-selfloop.
This ensures outline/evidence_drafts.jsonl and outline/anchor_sheet.jsonl are schema-stable before drafting packs are built.

Failure modes

If outline/outline.yml is missing or cannot be parsed, the skill FAILs.
If any target JSONL contains invalid JSON lines, the skill reports them and FAILs (do not proceed on corrupted artifacts).

Script (optional)

Quick Start

python .codex/skills/schema-normalizer/scripts/run.py --help
Normalize the C4 bridge artifacts:
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws>

All Options

--workspace <dir>
--unit-id <U###>
--inputs <semicolon-separated>
--outputs <semicolon-separated>
--checkpoint <C#>

Examples

Normalize the default C4 artifacts (ids/titles + citations format):
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/subsection_briefs.jsonl;outline/chapter_briefs.jsonl;outline/evidence_bindings.jsonl;outline/evidence_drafts.jsonl;outline/anchor_sheet.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md
Normalize writer packs too (if you are running this after writer-context-pack):
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/writer_context_packs.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md

related-skills.json

mismo repositorio

agent-survey-corpus.md

from "WILLOSCAR/research-units-pipeline-skills"

Download a small corpus of open-access arXiv survey/review PDFs about agentic systems and extract text for style learning. **Trigger**: agent survey corpus, ref corpus, download surveys, 学习综述写法, 下载 survey. **Use when**: you want to study how real agent surveys structure sections (6–8 H2), size subsections, and write evidence-backed comparisons. **Skip if**: you cannot download PDFs (no network) or you don't want local PDF files. **Network**: required. **Guardrail**: only download arXiv PDFs; store under `ref/` and keep large files out of git.

2026-05-30449

global-reviewer.md

from "WILLOSCAR/research-units-pipeline-skills"

Global consistency review for survey drafts: terminology, cross-section coherence, and scope/citation hygiene. Writes `output/GLOBAL_REVIEW.md` and (optionally) applies safe edits to `output/DRAFT.md`. **Trigger**: global review, consistency check, coherence audit, 术语一致性, 全局回看, 章节呼应, 拷打 writer. **Use when**: Draft exists and you want a final evidence-first coherence pass before LaTeX/PDF. **Skip if**: You are still changing the outline/mapping/notes (do those first), or prose writing is not approved. **Network**: none. **Guardrail**: Do not invent facts or citations; do not add new citation keys; treat missing evidence as a failure signal.

2026-05-30449

literature-engineer.md

from "WILLOSCAR/research-units-pipeline-skills"

Multi-route literature expansion + metadata normalization for evidence-first surveys. Produces a large candidate pool (`papers/papers_raw.jsonl`, target ≥1200) with stable IDs and provenance, ready for dedupe/rank + citation generation. **Trigger**: evidence collector, literature engineer, 文献扩充, 多路召回, snowballing, cited by, references, 元信息增强, provenance. **Use when**: 需要把候选文献扩充到 ≥1200 篇并补齐可追溯 meta（survey pipeline 的 Stage C1，写作前置 evidence）。 **Skip if**: 已经有高质量 `papers/papers_raw.jsonl`（≥1200 且每条都有稳定标识+来源记录）。 **Network**: 可离线（靠 imports）；雪崩/在线检索需要网络。 **Guardrail**: 不允许编造论文；每条记录必须带稳定标识（arXiv id / DOI / 可信 URL）和 provenance；不写 output/ prose。

2026-05-30449

pdf-text-extractor.md

from "WILLOSCAR/research-units-pipeline-skills"

Download PDFs (when available) and extract plain text to support full-text evidence, writing `papers/fulltext_index.jsonl` and `papers/fulltext/*.txt`. **Trigger**: PDF download, fulltext, extract text, papers/pdfs, 全文抽取, 下载PDF. **Use when**: `queries.md` 设置 `evidence_mode: fulltext`（或你明确需要全文证据）并希望为 paper notes/claims 提供更强 evidence。 **Skip if**: `evidence_mode: abstract`（默认）；或你不希望进行下载/抽取（成本/权限/时间）。 **Network**: fulltext 下载通常需要网络（除非你手工提供 PDF 缓存在 `papers/pdfs/`）。 **Guardrail**: 缓存下载到 `papers/pdfs/`；默认不覆盖已有抽取文本（除非显式要求重抽）。

2026-05-30449

prose-writer.md

from "WILLOSCAR/research-units-pipeline-skills"

Write `output/DRAFT.md` (or `output/SNAPSHOT.md`) from an approved outline and evidence packs, using only verified citation keys from `citations/ref.bib`. **Trigger**: write draft, prose writer, snapshot, survey writing, 写综述, 生成草稿, section-by-section drafting. **Use when**: structure is approved (`DECISIONS.md` has `Approve C2`) and evidence packs exist (`outline/subsection_briefs.jsonl`, `outline/evidence_drafts.jsonl`). **Skip if**: approvals are missing, or evidence packs are incomplete / scaffolded (missing-fields, TODO markers). **Network**: none. **Guardrail**: do not invent facts or citations; only cite keys present in `citations/ref.bib`; avoid pipeline-jargon leakage in final prose.

2026-05-30449

writer-selfloop.md

from "WILLOSCAR/research-units-pipeline-skills"

Writing self-loop for surveys: run the strict section-quality gate, then rewrite only the failing `sections/*.md` files until the report is PASS. **Trigger**: writer self-loop, writing loop, quality gate loop, rewrite failing sections, 自循环, 反复改到 PASS. **Use when**: per-section files exist but C5 is FAIL/BLOCKED (thin sections, missing leads/front matter, citation-scope violations, generator voice). **Skip if**: you are still pre-C2 (NO PROSE), or evidence packs are incomplete (fix C3/C4 first). **Network**: none. **Guardrail**: do not invent facts; only use citation keys present in `citations/ref.bib`; keep citations in-scope per `outline/evidence_bindings.jsonl`; do not add/remove citation keys during rewrites.

2026-05-30449

package.json

"author": "WILLOSCAR"

"repository": "WILLOSCAR/research-units-pipeline-skills"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas15-1252L4

name

schema-normalizer

description

Schema Normalizer (NO PROSE)

Purpose: close a common failure mode in skills-first pipelines: schema drift across JSONL artifacts.

Inputs

outline/outline.yml (source of truth for section/subsection ids + titles)
Optional (for citation-key sanity): citations/ref.bib
Default JSONL artifacts to normalize (arxiv-survey(-latex) C4 bridge):
- outline/subsection_briefs.jsonl
- outline/chapter_briefs.jsonl
- outline/evidence_bindings.jsonl
- outline/evidence_drafts.jsonl
- outline/anchor_sheet.jsonl
Optional (run after writer packs are generated):
- outline/writer_context_packs.jsonl

Outputs

output/SCHEMA_NORMALIZATION_REPORT.md (always written; PASS/FAIL + what changed)
The processed JSONL files are normalized in place (a .bak.* is created if changes are applied).

What gets normalized

1) IDs + titles (join keys)

For any record with sub_id: "<H2>.<H3>":

Ensure section_id exists (derived from the prefix before the dot)
Ensure title, section_title exist (filled from outline/outline.yml)

For any record with section_id: "<H2>":

Ensure section_title exists (filled from outline/outline.yml)

2) Citation key format (reduce parsing drift)

Within these C2-C4 JSONL artifacts, normalize citation keys so they are raw BibTeX keys (no @ prefix):

"citations": ["smith2023", "jones2024"]

Notes:

Final prose still uses Markdown citations: [@smith2023].
This skill does not add/remove citations; it only normalizes formatting.

When to run

Recommended placement in arxiv-survey(-latex):

Run after evidence-draft + anchor-sheet and before writer-context-pack + evidence-selfloop.
This ensures outline/evidence_drafts.jsonl and outline/anchor_sheet.jsonl are schema-stable before drafting packs are built.

Failure modes

If outline/outline.yml is missing or cannot be parsed, the skill FAILs.
If any target JSONL contains invalid JSON lines, the skill reports them and FAILs (do not proceed on corrupted artifacts).

Script (optional)

Quick Start

python .codex/skills/schema-normalizer/scripts/run.py --help
Normalize the C4 bridge artifacts:
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws>

All Options

--workspace <dir>
--unit-id <U###>
--inputs <semicolon-separated>
--outputs <semicolon-separated>
--checkpoint <C#>

Examples

Normalize the default C4 artifacts (ids/titles + citations format):
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/subsection_briefs.jsonl;outline/chapter_briefs.jsonl;outline/evidence_bindings.jsonl;outline/evidence_drafts.jsonl;outline/anchor_sheet.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md
Normalize writer packs too (if you are running this after writer-context-pack):
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/writer_context_packs.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md

schema-normalizer

Schema Normalizer (NO PROSE)

Inputs

Outputs

What gets normalized

1) IDs + titles (join keys)

2) Citation key format (reduce parsing drift)

When to run

Failure modes

Script (optional)

Quick Start

All Options

Examples

Más de este repositorio

Más de este repositorio

Schema Normalizer (NO PROSE)

Inputs

Outputs

What gets normalized

1) IDs + titles (join keys)

2) Citation key format (reduce parsing drift)

When to run

Failure modes

Script (optional)

Quick Start

All Options

Examples