en un clic
llm-wiki
// Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency.
// Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency.
[HINT] Téléchargez le répertoire complet incluant SKILL.md et tous les fichiers associés
| name | llm-wiki |
| description | Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency. |
| version | 3.0.0 |
| author | Hermes Agent |
| license | MIT |
| metadata | {"hermes":{"tags":["wiki","knowledge-base","research","notes","markdown","rag-alternative","batch-ingest"],"category":"research","related_skills":["obsidian","arxiv","agentic-research-ideas"],"config":[{"key":"wiki.path","description":"Path to the LLM Wiki knowledge base directory","default":"~/wiki","prompt":"Wiki directory path"}]}} |
Build and maintain a persistent, compounding knowledge base as interlinked markdown files. Based on Andrej Karpathy's LLM Wiki pattern.
Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki compiles knowledge once and keeps it current. Cross-references are already there. Contradictions have already been flagged. Synthesis reflects everything ingested.
Division of labor: The human curates sources and directs analysis. The agent summarizes, cross-references, files, and maintains consistency.
Use this skill when the user:
Configured via skills.config.wiki.path in ~/.hermes/config.yaml (prompted
during hermes config migrate or hermes setup):
skills:
config:
wiki:
path: ~/wiki
Falls back to ~/wiki default. The resolved path is injected when this
skill loads — check the [Skill config: ...] block above for the active value.
Before setting wiki.path, verify the target actually exists. A stale default like
~/wiki is easy to leave behind even when the real wiki lives elsewhere.
For the workspace-hub multi-wiki layout, the preferred root is usually:
skills:
config:
wiki:
path: /mnt/local-analysis/workspace-hub/knowledge/wikis
This points at the domain-wiki root (engineering/, marine-engineering/,
maritime-law/, naval-architecture/, etc.) rather than a single flat ~/wiki
folder.
The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or any editor. No database, no special tooling required.
The llm-wiki CLI at scripts/knowledge/llm_wiki.py provides 6 commands for
operating wikis programmatically. All commands use the pattern:
uv run scripts/knowledge/llm_wiki.py <command> --wiki <domain>
| Command | Purpose |
|---|---|
init <domain> | Scaffold a new domain wiki under knowledge/wikis/<domain>/ |
status --wiki <d> | Report page counts, source counts, link density |
ingest <file> --wiki <d> | Copy source file + generate LLM processing instructions |
query "..." --wiki <d> | Keyword search across wiki pages with relevance ranking |
lint --wiki <d> | Health checks (orphans, empty pages, index consistency, link density) |
batch-ingest <file> --wiki <d> --batch-size N | Bulk-create source pages from metadata JSONL/JSON/YAML |
batch-ingest is designed for scale:
.checkpoint.jsonl in wiki root)--dry-run for previewLocation: knowledge/wikis/<domain>/ (not ~/wiki). This is a multi-wiki
ecosystem — multiple domain wikis coexist under knowledge/wikis/. Force-add
to git despite .gitignore since wiki content is the compounding artifact.
In workspace-hub, wikis are organized as a multi-domain ecosystem under
knowledge/wikis/<domain>/, not a single ~/wiki. Each domain
(marine-engineering, maritime-law, naval-architecture) has its own
complete three-layer structure. Cross-wiki linking connects related topics
across domains.
knowledge/wikis/<domain>/
├── CLAUDE.md # Schema: conventions, structure rules, domain config
├── raw/ # Layer 1: Immutable source material
│ ├── papers/ # PDFs, standards, papers
│ ├── standards/ # Standards documents
│ ├── articles/ # Web articles, clippings
│ └── assets/ # Images, diagrams
└── wiki/ # Layer 2: The LLM-maintained wiki
├── index.md # Content catalog with sectioned entries
├── log.md # Chronological action log (append-only)
├── overview.md # Domain synthesis summary
├── entities/ # Entity pages (things: equipment, orgs, vessels)
├── concepts/ # Concept pages (ideas: methods, principles)
├── sources/ # Source summary pages (one per ingested document)
├── comparisons/ # Filed query outputs
└── visualizations/ # matplotlib plots, Marp slide decks
Layer 1 — Raw Sources: Immutable. The agent reads but never modifies these.
Layer 2 — The Wiki: Agent-owned markdown files. Created, updated, and
cross-referenced by the agent.
Layer 3 — The Schema: CLAUDE.md defines structure, conventions, and tag taxonomy.
.checkpoint.jsonl tracks processed records by unique ID.git add -f) even if
.gitignore excludes the wikis directory. Wiki content is the compounding artifact.knowledge/wikis/<domain>/CLAUDE.md files are wiki schema/config files generated by llm-wiki init, not harness adapter files. If the repo hook .claude/hooks/check-claude-md-limits.sh applies the 20-line harness limit to all CLAUDE.md paths, commits touching wiki CLAUDE.md can fail with a false positive. The minimal safe fix is to exclude ^knowledge/wikis/ from that hook's staged-file filter so harness limits still apply to real adapter files while wiki schema files remain editable.When the user has an existing wiki, always orient yourself before doing anything:
① Read CLAUDE.md (or SCHEMA.md) — understand the domain, conventions, and tag taxonomy.
② Read index.md — learn what pages exist and their summaries.
③ Scan recent log.md — read the last 20-30 entries to understand recent activity.
WIKI="${wiki_path:-$HOME/wiki}"
# Orientation reads at session start
read_file "$WIKI/SCHEMA.md"
read_file "$WIKI/index.md"
read_file "$WIKI/log.md" offset=<last 30 lines>
Only after orientation should you ingest, query, or lint. This prevents:
For large wikis (100+ pages), also run a quick search_files for the topic
at hand before creating anything new.
uv run scripts/knowledge/llm_wiki.py init <domain>
This scaffolds the full three-layer structure, creates CLAUDE.md with
domain-specific schema, initializes index.md and log.md, and creates
the raw/ and wiki/ subdirectories.
After scaffolding:
ingest <file> --wiki <domain>batch-ingest metadata.jsonl --wiki <domain> --batch-size 100lint --wiki <domain>When the user provides a source (URL, file, paste), integrate it into the wiki:
① Capture the raw source:
web_extract to get markdown, save to raw/articles/web_extract (handles PDFs), save to raw/papers/raw/ subdirectoryraw/articles/karpathy-llm-wiki-2026.md② Discuss takeaways with the user — what's interesting, what matters for the domain. (Skip this in automated/cron contexts — proceed directly.)
③ Check what already exists — search index.md and use search_files to find
existing pages for mentioned entities/concepts. This is the difference between
a growing wiki and a pile of duplicates.
④ Write or update wiki pages:
updated date.
When new info contradicts existing content, follow the Update Policy.[[wikilinks]]. Check that existing pages link back.⑤ Update navigation:
index.md under the correct section, alphabeticallylog.md: ## [YYYY-MM-DD] ingest | Source Title⑥ Report what changed — list every file created or updated to the user.
A single source can trigger updates across 5-15 wiki pages. This is normal and desired — it's the compounding effect.
When the user asks a question about the wiki's domain:
① Read index.md to identify relevant pages.
② For wikis with 100+ pages, also search_files across all .md files
for key terms — the index alone may miss relevant content.
③ Read the relevant pages using read_file.
④ Synthesize an answer from the compiled knowledge. Cite the wiki pages
you drew from: "Based on [[page-a]] and [[page-b]]..."
⑤ File valuable answers back — if the answer is a substantial comparison,
deep dive, or novel synthesis, create a page in queries/ or comparisons/.
Don't file trivial lookups — only answers that would be painful to re-derive.
⑥ Update log.md with the query and whether it was filed.
When the user asks to lint, health-check, or audit the wiki:
① Orphan pages: Find pages with no inbound [[wikilinks]] from other pages.
# Use execute_code for this — programmatic scan across all wiki pages
# Scan all .md files in entities/, concepts/, comparisons/, queries/
# Extract all [[wikilinks]] — build inbound link map
# Pages with zero inbound links are orphans
② Broken wikilinks: Find [[links]] that point to pages that don't exist.
③ Index completeness: Every wiki page should appear in index.md. Compare
the filesystem against index entries.
④ Frontmatter validation: Every wiki page must have all required fields (title, created, updated, type, tags, sources). Tags must be in the taxonomy.
⑤ Stale content: Pages whose updated date is >90 days older than the most
recent source that mentions the same entities.
⑥ Contradictions: Pages on the same topic with conflicting claims. Look for pages that share tags/entities but state different facts.
⑦ Page size: Flag pages over 200 lines — candidates for splitting.
⑧ Tag audit: List all tags in use, flag any not in the SCHEMA.md taxonomy.
⑨ Log rotation: If log.md exceeds 500 entries, rotate it.
⑩ Report findings with specific file paths and suggested actions, grouped by severity (broken links > orphans > stale content > style issues).
⑪ Append to log.md: ## [YYYY-MM-DD] lint | N issues found
For large-scale ingestion (100+ sources), use the llm-wiki batch-ingest CLI:
# Dry-run first to preview
uv run scripts/knowledge/llm_wiki.py batch-ingest metadata.jsonl --wiki <domain> --batch-size 100 --dry-run
# Then run for real (resume-safe via .checkpoint.jsonl)
uv run scripts/knowledge/llm_wiki.py batch-ingest metadata.jsonl --wiki <domain> --batch-size 100
The CLI handles:
--dry-run mode to preview filenames and countsFor structured YAML knowledge seeds in knowledge/seeds/:
Proven pattern: 18 mooring failure entries → 4 wiki pages (source + 2 concepts + 1 entity) 10 law cases + 6 conventions → 20 wiki pages with cross-references
# Find pages by content
search_files "transformer" path="$WIKI" file_glob="*.md"
# Find pages by filename
search_files "*.md" target="files" path="$WIKI"
# Find pages by tag
search_files "tags:.*alignment" path="$WIKI" file_glob="*.md"
# Recent activity
read_file "$WIKI/log.md" offset=<last 20 lines>
For structured YAML seeds (like knowledge/seeds/naval-architecture-resources.yaml):
This approach is much faster than full PDF extraction and creates structured wiki pages that can be enhanced later with LLM content.
When managing multiple domain wikis, look for natural connections:
The wiki directory works as an Obsidian vault out of the box:
[[wikilinks]] render as clickable linksraw/assets/ folder holds images referenced via ![[image.png]]For best results:
raw/assets/TABLE tags FROM "entities" WHERE contains(tags, "company")raw/ — sources are immutable. Corrections go in wiki pages.log-YYYY.md and start fresh.
The agent should check log size during lint..gitignore. Use git add -f to commit
content. Wiki content is the compounding artifact and must be tracked.