| name | wiki-enrich |
| description | Fill in the per-paper TODO sections of research-wiki/papers/<slug>.md pages that literature-ingest skills leave as bare scaffolds. Use when user says 'enrich wiki', 'fill paper TODOs', 'wiki body ่ฃๅฎ', 'ๆ paper ๆ่ฆๅฏซ้ฒ wiki', 'research-wiki ่ชๅๅกซ', or after a batch ingest that left papers/ as TODO scaffolds. |
| argument-hint | [target: slug|missing|all] [--source alphaxiv|deepxiv|arxiv|auto] [--force] [--max N] |
| allowed-tools | Bash(*), Read, Write, Edit, Glob, Grep, WebFetch |
Wiki Enrich: Fill Paper TODO Sections (Karpathy LLM-Wiki)
Target: $ARGUMENTS
Why this skill exists
ingest_paper (called by /research-lit, /arxiv, /alphaxiv, /deepxiv, /semantic-scholar, /exa-search) only renders the per-paper scaffold โ frontmatter + abstract + 10 fillable _TODO._ placeholder sections (plus two protected sections: ## Connections is graph-summary and ## Abstract (original) is auto-populated when --arxiv-id is given). No downstream skill in ARIS fills those 10 sections; the wiki sits as TODO until someone reads each paper.
This contradicts the Karpathy LLM-wiki design (https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f):
"You never (or rarely) write the wiki yourself โ the LLM writes and maintains all of it. โฆ The tedious part of maintaining a knowledge base is not the reading or the thinking โ it's the bookkeeping. โฆ LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass."
/wiki-enrich is the missing back half of ingest_paper: it reads each scaffolded paper page, fetches paper content from external sources via a graceful fallback chain (see Phase 2.3 for the full 5-source chain), and rewrites the 10 fillable TODO sections into 1-3 sentence prose summaries.
Constants
- WIKI_ROOT =
research-wiki/ โ Resolved relative to git root. Skill hard-fails if not a directory.
- TARGET_DEFAULT =
missing โ When no target is given, enrich only papers with โฅ1 TODO section. Other targets: <slug> (one paper) or all (every paper, even ones already enriched โ usually combined with --force to overwrite).
- SOURCE_DEFAULT =
auto โ Fetch order: alphaxiv overview โ alphaxiv abs โ deepxiv brief โ arXiv API abstract โ page abstract fallback. First non-empty wins (full chain documented in Phase 2.3 table). Override with --source to pin one source.
- MAX_PAPERS = 20 โ Hard cap per invocation; LLMs touch many files but token budgets are real. Override with
--max N.
- FORCE = false โ When
false (default), skip sections that already have non-TODO content. When true, overwrite every fillable section, but never touch the two protected sections: ## Connections (auto-generated from edges.jsonl) and ## Abstract (original) (immutable arXiv-fetched source data).
- SECTIONS_TO_FILL โ 10 fillable sections + 2 protected.
ingest_paper (research_wiki.py:436-473) scaffolds 11 section headers unconditionally and a 12th โ ## Abstract (original) โ only when arXiv returns an abstract for the given --arxiv-id (research_wiki.py:469-473). Of these, 10 carry a _TODO._ (or _TODO: fill in after reading._) marker and need filling. The other 2 โ ## Connections (position 10 in the enumeration below) and ## Abstract (original) (position 12, conditional) โ are protected by construction: Connections is auto-generated from graph/edges.jsonl, Abstract (original) is immutable source data from the arXiv API. This skill writes to the 10, never the 2.
One-line thesis (marker: _TODO: fill in after reading._)
Problem / Gap (marker: _TODO._)
Method (marker: _TODO._)
Key Results (marker: _TODO._)
Assumptions (marker: _TODO._)
Limitations / Failure Modes (marker: _TODO._)
Reusable Ingredients (marker: _TODO._)
Open Questions (marker: _TODO._)
Claims (marker: _TODO._) โ fill with _No claims tracked yet._ if no claim: edges point to this paper; otherwise list them.
Connections โ NEVER edit (auto-generated from graph/edges.jsonl).
Relevance to This Project (marker: _TODO._) โ use RESEARCH_BRIEF.md, CLAUDE.md, or gap_map.md for project context. If no project context exists, leave as TODO and report it.
Abstract (original) โ leave alone (already populated by ingest_paper when --arxiv-id was used).
๐ก Examples:
/wiki-enrich โ enrich every paper with โฅ1 TODO section (most common usage)
/wiki-enrich vllm โ enrich a single paper by slug
/wiki-enrich all --force โ rewrite every paper from scratch (use when you've adopted a new style)
/wiki-enrich --source alphaxiv --max 5 โ only use alphaxiv, only do 5 papers
/wiki-enrich missing --max 50 โ bigger batch (watch token budget)
Pre-flight
Resolve $WIKI_ROOT and $WIKI_SCRIPT (canonical chain โ see shared-references/wiki-helper-resolution.md):
cd "$(git rev-parse --show-toplevel 2>/dev/null || pwd)" || exit 1
[ -d research-wiki/ ] || { echo "ERROR: research-wiki/ not found. Run /research-wiki init first." >&2; exit 1; }
ARIS_REPO="${ARIS_REPO:-$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills.txt 2>/dev/null)}"
WIKI_SCRIPT=".aris/tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || WIKI_SCRIPT="tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || { [ -n "${ARIS_REPO:-}" ] && WIKI_SCRIPT="$ARIS_REPO/tools/research_wiki.py"; }
[ -f "$WIKI_SCRIPT" ] || { echo "ERROR: research_wiki.py not found." >&2; exit 1; }
If either fails, hard-fail โ this skill manipulates wiki state and must not run blind.
Workflow
Phase 1: Parse target + discover candidates
Parse $ARGUMENTS for the first positional (target) and flags (--source, --force, --max).
Build the candidate paper list:
case "$TARGET" in
all)
PAPERS=( research-wiki/papers/*.md )
;;
missing|"")
PAPERS=( $(grep -lE "^_TODO(\._?|: fill in after reading\._?)$" research-wiki/papers/*.md 2>/dev/null) )
;;
*)
P="research-wiki/papers/${TARGET}.md"
[ -f "$P" ] || { echo "ERROR: paper not found: $P" >&2; exit 1; }
PAPERS=( "$P" )
;;
esac
echo "Candidate papers: ${#PAPERS[@]} (cap ${MAX_PAPERS})"
PAPERS=( "${PAPERS[@]:0:${MAX_PAPERS}}" )
If the candidate list is empty, print "โ Nothing to enrich." and exit 0. Do not error.
Phase 2: For each paper โ read, fetch, fill
Iterate one paper at a time. For each $PAPER in $PAPERS:
Step 2.1 โ Read the page and project context. Use the Read tool on the full paper file. Extract from the YAML frontmatter:
node_id (e.g. paper:vllm) โ slug = part after paper:
arxiv from external_ids.arxiv โ empty string if absent
title
- existing
## Abstract (original) blockquote (if present) โ fallback content source
Additionally, on the FIRST paper of the batch (cache for the rest), read project-context files needed for the Claims and Relevance to This Project sections:
research-wiki/graph/edges.jsonl โ scan for claim: edges pointing to the current paper's node_id
RESEARCH_BRIEF.md (project root) โ if present, source for project goals
CLAUDE.md (project root) โ if present, fallback for project context
research-wiki/gap_map.md โ if non-empty, source for gap framing
If none of the project-context files exist, the Relevance to This Project section will be filled with the literal "context not yet set" line (see Step 2.4 table).
Step 2.2 โ Identify which sections are TODO.
Match each section header against its marker:
- A header followed by exactly
_TODO._ โ fill
- A header followed by
_TODO: fill in after reading._ โ fill (One-line thesis)
- A header followed by any other content โ skip (unless
--force)
## Connections โ always skip (auto-generated)
## Abstract (original) โ always skip (immutable source data)
If no fillable sections remain, log "skip: <slug> (already enriched)" and continue.
Step 2.3 โ Fetch source content.
The fetch chain runs in order until one returns usable content (>200 chars of text):
| Order | Source | How |
|---|
| 1 | alphaxiv overview (auto default; --source alphaxiv to pin) | WebFetch https://alphaxiv.org/overview/<arxiv_id>.md โ LLM-optimized summary, often best for filling sections |
| 2 | alphaxiv abs (fallback within alphaxiv) | WebFetch https://alphaxiv.org/abs/<arxiv_id>.md |
| 3 | deepxiv brief (--source deepxiv to pin) | python3 "$DEEPXIV_FETCHER" paper-brief <arxiv_id> if helper resolves |
| 4 | arXiv API abstract โ fresh fetch (--source arxiv to pin) | curl http://export.arxiv.org/api/query?id_list=<arxiv_id> โ log label: arxiv-api-abstract |
| 5 | Page abstract โ fallback (last resort) | Reuse the existing ## Abstract (original) blockquote already present in the page body from a prior ingest_paper run โ log label: page-abstract-fallback |
| โ | No arxiv id + no page abstract | Skip this paper, log "skip: <slug> (no arxiv id, no abstract)", continue |
When trying alphaxiv: if WebFetch returns 404 / "Paper not found" / a redirect to the homepage, treat as miss and fall through.
When trying deepxiv: resolve $DEEPXIV_FETCHER per shared-references/integration-contract.md. If the helper or deepxiv CLI is missing, fall through silently.
Save the fetched content as $SOURCE_TEXT. Record which source succeeded for the log entry.
Step 2.4 โ Generate per-section content.
You (Claude) are the LLM doing the grunt work. Given:
$SOURCE_TEXT (the fetched overview / brief / abstract)
$TITLE
- the list of fillable section headers
Write each TODO section's body following these rules:
| Section | Length | Style | What to extract |
|---|
| One-line thesis | 1 sentence, โค25 words | Declarative | The paper's core contribution in one sentence โ what they built / proved / improved |
| Problem / Gap | 1-2 sentences | Declarative | What problem the field had, why prior work fell short |
| Method | 2-4 sentences | Technical, name the technique | Core mechanism โ algorithm name + key idea + how it differs from baselines |
| Key Results | 1-3 bullets OR 2-3 sentences | Quantitative | Headline numbers from the abstract / overview (X% improvement, Yx speedup, etc.). Keep units verbatim. |
| Assumptions | 1-3 bullets | Declarative | What the paper takes for granted (workload type, hardware, model class, distribution shape) |
| Limitations / Failure Modes | 1-3 bullets | Honest | What the paper explicitly admits OR what's structurally absent (e.g. "no multi-node evaluation", "assumes uniform request length") |
| Reusable Ingredients | 1-3 bullets | Concrete | Techniques / datasets / insights from this paper that could be ported elsewhere. Highest value for /idea-creator โ write carefully. |
| Open Questions | 1-2 bullets | Question form | What the paper does NOT answer but raises |
| Claims | 1 line | Static | If no claim: edges in graph/edges.jsonl reference this paper, write the literal italic line: _No claims tracked yet โ populate via /proof-checker._. Else list claim node IDs. |
| Relevance to This Project | 1-2 sentences | Project-contextual | Use RESEARCH_BRIEF.md / CLAUDE.md / gap_map.md to phrase the connection. If no project context, write the literal italic line: _Project context not yet set โ populate RESEARCH_BRIEF.md or gap_map.md to enable this section._ and report. |
Rules (Karpathy fidelity):
- Faithful to source. If the paper doesn't say it, don't invent it. Prefer
_Not stated in source._ over hallucination.
- No filler. "This paper presents an approach to..." โ don't write that. Start with the noun.
- Keep technical terms in English. vLLM, KV cache, prefill, decode, TTFT, etc. stay verbatim.
- Quantitative when possible. If the abstract has numbers, use them; don't paraphrase as "significant".
- Bilingual support. If the project's
CLAUDE.md declares a language preference (language: zh or language: bilingual), match it. Otherwise default to English (or follow shared-references/output-language.md).
Step 2.5 โ Edit the file.
For each fillable section, use the Edit tool to replace the TODO marker with the generated body. Match the exact section header + marker pair to keep edits unique, e.g.:
## Problem / Gap
_TODO._
โ
## Problem / Gap
<generated body>
Never touch the YAML frontmatter, ## Connections, or ## Abstract (original).
Step 2.6 โ Append log entry.
python3 "$WIKI_SCRIPT" log research-wiki/ "wiki-enrich: enriched paper:<slug> from <source> (filled N/M sections)"
Record which source provided content (alphaxiv-overview, alphaxiv-abs, deepxiv-brief, arxiv-api-abstract, or page-abstract-fallback) so the audit trail is honest about provenance.
Phase 3: Final report
After processing all candidates, print:
โ wiki-enrich complete
Processed: N
Enriched: X (sections filled: total)
Skipped: Y (reasons: already enriched / no arxiv id / fetch failed)
Failed: Z (with paper + reason)
Source breakdown:
alphaxiv-overview: A
alphaxiv-abs: B
deepxiv-brief: C
arxiv-api-abstract: D
page-abstract-fallback: E
Re-ideation suggestion: <if โฅ5 papers were enriched, recommend `/idea-creator "topic"` so the freshly-filled `Reusable Ingredients` and `Limitations` feed brainstorming. `query_pack.md` is already rebuilt below โ the user does NOT need to call `/research-wiki query` manually.>
Also rebuild query_pack.md once at the end (single python3 "$WIKI_SCRIPT" rebuild_query_pack research-wiki/ call) so /idea-creator sees the new bodies on its next run.
Output Protocols
Follow the shared protocols:
- No
MANIFEST.md entry. This skill edits existing scaffolded pages in place rather than generating new artifacts. The audit trail lives in research-wiki/log.md (Step 2.6), with provenance per paper. Adding a wiki-enrich stage to shared-references/output-manifest.md is out of scope for this PR.
- Output Language Protocol โ respect the project's language setting.
Key Rules
- Idempotent by default. Re-running without
--force only touches still-TODO sections. Safe to invoke as a cron.
- Never touch frontmatter,
## Connections, or ## Abstract (original). Frontmatter is metadata, Connections is graph-generated, Abstract is immutable source data.
- Hard-fail on missing wiki / missing helper. Do not silently create
research-wiki/ โ if it's missing, the user is in the wrong cwd or hasn't run /research-wiki init.
- Track provenance. Every log entry records which source actually filled the body. If a future audit shows alphaxiv hallucinated for a paper, you can find every page touched by that source.
- Don't auto-trigger
/idea-creator. This skill builds the substrate; the user decides when to brainstorm next. Only suggest re-ideation in the final report.
- Gracefully degrade. If
WebFetch is rate-limited, fall through to next source. If all sources miss, skip the paper and continue โ don't abort the whole batch.
- Karpathy fidelity above completeness. It is better to leave a section as
_Not stated in source._ than to hallucinate. The wiki's value is that it doesn't lie.
Composing with Other Skills
/research-lit "topic" โ ingests papers as scaffolds (Step 6)
/wiki-enrich โ THIS โ fills paper bodies (you are here)
/research-wiki lint โ health-check (orphans, contradictions, dead ideas)
/idea-creator "direction" โ reads query_pack, ideates on top of enriched wiki
/research-wiki query "topic" โ rebuild query_pack after big wiki changes
After a fresh /research-pipeline run leaves Stage 1 Phase 1 done but Phase 2 not started (the failure mode that prompted this skill), the recovery path is:
/wiki-enrich # fill the paper TODOs ingest_paper left behind
/idea-creator "..." # now ideate with a wiki that actually has content