| name | llm-wiki-builder |
| description | Build and maintain a personal LLM Wiki knowledge base using the Karpathy three-layer architecture. Use this skill when the user wants to create, organize, or maintain a structured wiki knowledge base โ including ingesting source materials, generating wiki pages, setting up health checks, and managing cross-references. Triggers: 'build a knowledge base', 'create a wiki', 'organize my notes into a wiki', 'set up knowledge management', 'ingest documents into wiki'. |
LLM Wiki Builder
Build a personal knowledge base where LLM is the programmer, Markdown is the code, and the wiki is the codebase.
Based on Karpathy's LLM Wiki pattern, refined through production use on a 600+ page knowledge base.
Core Philosophy
- Raw sources are immutable โ LLM reads but never writes to raw/
- Wiki is the single source of truth โ all knowledge lives as .md pages
- Schema evolves collaboratively โ CLAUDE.md defines conventions, you and the user improve it together
- Operations leave traces โ every ingest/query/lint is logged in log.md
Quick Start
kb/
โโโ raw/ โ Drop source files here (PDFs, docs, notes)
โโโ wiki/
โ โโโ <topic>/
โ โโโ CLAUDE.md โ Schema: conventions, formats, link rules
โ โโโ log.md โ Append-only timeline
โ โโโ index.md โ Content index (what's in this wiki)
โ โโโ pages/ โ Wiki pages (LLM-maintained)
โโโ scripts/ โ Pipeline + health check tools
โโโ prompts/
โโโ wiki_builder.md โ LLM prompt template for ingestion
Workflow Decision Tree
- Starting fresh? โ Follow "Setup" below
- Ingesting new materials? โ Follow "Ingest" workflow
- Querying the wiki? โ Follow "Query" workflow
- Maintaining health? โ Follow "Lint" workflow
- Customizing for a domain? โ Adapt CLAUDE.md
Setup
1. Create Directory Structure
kb/
โโโ raw/
โ โโโ documents/ โ Source PDFs, docs, papers
โ โโโ notes/ โ Text notes, markdown files
โ โโโ data/ โ CSVs, JSONs, structured data
โโโ wiki/
โ โโโ <topic>/
โโโ scripts/ โ Copy from this skill's scripts/
โโโ prompts/
โโโ wiki_builder.md โ Copy from this skill's references/
2. Create CLAUDE.md (Schema)
Copy references/CLAUDE_template.md to wiki/<topic>/CLAUDE.md and customize:
- Replace
<topic> with the knowledge domain name
- Define page types (entity, concept, document, etc.)
- Define front matter fields per page type
- Define link conventions
- Define naming conventions
3. Create log.md
Initialize with a header:
# Change Log
> Append-only. Never edit or delete existing entries.
## [YYYY-MM-DD] init | Knowledge base created
- Structure: three-layer (raw โ wiki โ schema)
- Topic: <your topic>
4. Create index.md
Build a content index listing all top-level categories and entity counts.
5. Customize the Pipeline
Edit scripts/pipeline.py โ update these constants:
WIKI_SUBDIR โ target wiki subdirectory under wiki/
FRONT_MATTER_FIELDS โ required YAML front matter fields
PROMPT_PATH โ path to your wiki_builder.md prompt
Edit scripts/health_check.py โ update these constants:
WIKI_SUBDIR โ same wiki subdirectory
REQUIRED_FM_FIELDS โ fields to check in front matter
PAGE_PATTERNS โ glob patterns for page discovery
Ingest Workflow
Using the Pipeline Script
python scripts/pipeline.py
python scripts/pipeline.py --watch
python scripts/pipeline.py --dry-run
python scripts/pipeline.py --category documents
python scripts/pipeline.py --status
Manual Ingest (LLM-Assisted)
When the pipeline isn't suitable (e.g., structured data, images):
- Read the source file from
raw/
- Parse key information (entities, relationships, metadata)
- Create wiki pages following the CLAUDE.md format
- Add cross-references (links between related pages)
- Update
index.md
- Append to
log.md:
## [YYYY-MM-DD] ingest | <description>
- Source: raw/<path>
- Generated: wiki/<topic>/pages/<file>.md
- Entities extracted: N
- Links created: N
Query Workflow
- Locate: Read
index.md or _maintenance/summary_index.json to find relevant pages
- Read: Load the relevant wiki pages
- Synthesize: Combine information from multiple pages into an answer
- Capture: If the answer represents new knowledge, create a new wiki page
- Log: Append to
log.md:
## [YYYY-MM-DD] query | <topic searched>
- Pages consulted: page1.md, page2.md
- New page created: wiki/<topic>/pages/<new_page>.md (if applicable)
Lint Workflow
Using the Health Check Script
python scripts/health_check.py
python scripts/health_check.py --quick
Checks Performed
| Check | What it catches |
|---|
| Broken links | [[target]] where target.md doesn't exist |
| Missing front matter | Pages without YAML --- block |
| Empty required fields | Pages with blank/placeholder required fields |
| Orphan pages | Pages with zero incoming links |
| Duplicate entries | Pages with similar names or titles |
| Pipeline state | Files in raw/ not yet processed |
Manual Lint
When you spot issues during normal work:
- Fix the issue directly in the wiki page
- Update any affected index pages
- Append to
log.md:
## [YYYY-MM-DD] lint | <description>
- Issue: <what was wrong>
- Fix: <what was changed>
- Pages affected: N
CLAUDE.md Conventions
The Schema file should define:
Page Types
Each wiki should have clear page types with consistent formats. See references/CLAUDE_template.md for a starter template.
Front Matter
Every page must have YAML front matter:
---
title: "Page Title"
type: entity|concept|document|category
tags: ["tag1", "tag2"]
date_created: YYYY-MM-DD
status: complete|draft|needs-review
---
Link Format
- Internal links:
[[Page Name]] or [Display Text](relative/path.md)
- Cross-directory:
[Text](../../other-section/page.md)
- External:
[Text](https://example.com)
Naming Conventions
- Files:
[identifier]title.md (e.g., [001]Entity Name.md)
- Directories: descriptive names, no spaces preferred
- Index files: always
index.md in each directory
Resources
scripts/
pipeline.py โ Raw โ Wiki ingestion pipeline (watch, status, dry-run)
health_check.py โ Wiki health checker (links, front matter, orphans)
references/
CLAUDE_template.md โ Starter Schema file to customize
wiki_builder_prompt.md โ LLM prompt template for ingestion
architecture.md โ Detailed three-layer architecture explanation
page_templates.md โ Template examples for different page types
Domain Customization Examples
Academic Research
pages/
โโโ papers/ โ type: paper (title, authors, year, abstract, key findings)
โโโ concepts/ โ type: concept (definition, related work, applications)
โโโ authors/ โ type: author (name, affiliation, papers, h-index)
โโโ methods/ โ type: method (name, description, use cases, limitations)
Personal Knowledge
pages/
โโโ notes/ โ type: note (title, date, tags, summary)
โโโ people/ โ type: person (name, context, relationship, notes)
โโโ projects/ โ type: project (name, status, tasks, links)
โโโ references/ โ type: reference (source, author, key points, quotes)
Technical Documentation
pages/
โโโ apis/ โ type: api (endpoint, method, params, response, examples)
โโโ components/ โ type: component (name, props, usage, examples)
โโโ decisions/ โ type: decision (context, options, outcome, date)
โโโ guides/ โ type: guide (title, audience, steps, prerequisites)
Important Notes
- log.md is append-only โ never modify or delete existing entries
- raw/ is immutable โ never modify source files
- Windows users: set
PYTHONUTF8=1 before running Python scripts
- Pipeline state is tracked in
scripts/.pipeline_state.json
- Health reports are saved to
wiki/<topic>/_maintenance/health_report.json