| name | wiki-lint |
| description | Health check the Obsidian wiki vault. Finds orphan pages, dead wikilinks, stale claims, missing cross-references, frontmatter gaps, and empty sections. Creates or updates Dataview dashboards. Generates canvas maps. Triggers on: "lint", "health check", "clean up wiki", "check the wiki", "wiki maintenance", "find orphans", "wiki audit".
|
wiki-lint: Wiki Health Check
Run lint after every 10-15 ingests, or weekly. Ask before auto-fixing anything. Output a lint report to wiki/meta/lint-report-YYYY-MM-DD.md.
Transport (v1.7+)
Lint primarily reads, then writes a single report file. Both follow the standard transport policy. Read .vault-meta/transport.json (auto-created by bash scripts/detect-transport.sh):
- cli —
obsidian-cli read "$VAULT" "$NOTE" for individual reads; obsidian-cli backlinks "$VAULT" "$NOTE" natively handles backlink graph (avoids re-rolling it via Grep); see skills/wiki-cli/SKILL.md
- mcp-obsidian / mcpvault —
mcp__obsidian-vault__read_multiple_notes, list_all_tags
- filesystem — Claude's
Read/Glob/Grep (final floor; current v1.6 behavior)
Full decision tree: wiki/references/transport-fallback.md. DragonScale Mechanism 3 tiling lint is a separate code path (Python script) and bypasses transport selection.
Lint Checks
Work through these in order:
- Orphan pages. Wiki pages with no inbound wikilinks. They exist but nothing points to them.
- Dead links. Wikilinks that reference a page that does not exist.
- Stale claims. Assertions on older pages that newer sources have contradicted or updated.
- Missing pages. Concepts or entities mentioned in multiple pages but lacking their own page.
- Missing cross-references. Entities mentioned in a page but not linked.
- Frontmatter gaps. Pages missing required fields (type, status, created, updated, tags).
- Empty sections. Headings with no content underneath.
- Stale index entries. Items in
wiki/index.md pointing to renamed or deleted pages.
- Address validity (DragonScale Mechanism 2). For every page that has an
address: frontmatter field, validate the format. See the Address Validation section below.
- Semantic tiling (DragonScale Mechanism 3, opt-in). Flag candidate duplicate pages (across all scanned types, not just concepts) via embedding cosine similarity. See the Semantic Tiling section below.
Lint Report Format
Create at wiki/meta/lint-report-YYYY-MM-DD.md:
---
type: meta
title: "Lint Report YYYY-MM-DD"
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: [meta, lint]
status: developing
---
# Lint Report: YYYY-MM-DD
## Summary
- Pages scanned: N
- Issues found: N
- Auto-fixed: N
- Needs review: N
## Orphan Pages
- [[Page Name]]: no inbound links. Suggest: link from [[Related Page]] or delete.
## Dead Links
- [[Missing Page]]: referenced in [[Source Page]] but does not exist. Suggest: create stub or remove link.
## Missing Pages
- "concept name": mentioned in [[Page A]], [[Page B]], [[Page C]]. Suggest: create a concept page.
## Frontmatter Gaps
- [[Page Name]]: missing fields: status, tags
## Stale Claims
- [[Page Name]]: claim "X" may conflict with newer source [[Newer Source]].
## Cross-Reference Gaps
- [[Entity Name]] mentioned in [[Page A]] without a wikilink.
Naming Conventions
Enforce these during lint:
| Element | Convention | Example |
|---|
| Filenames | Title Case with spaces | Machine Learning.md |
| Folders | lowercase with dashes | wiki/data-models/ |
| Tags | lowercase, hierarchical | #domain/architecture |
| Wikilinks | match filename exactly | [[Machine Learning]] |
Filenames must be unique across the vault. Wikilinks work without paths only if filenames are unique.
Writing Style Check
During lint, flag pages that violate the style guide:
- Not declarative present tense ("X basically does Y" instead of "X does Y")
- Missing source citations where claims are made
- Uncertainty not flagged with
> [!gap]
- Contradictions not flagged with
> [!contradiction]
Dataview Dashboard
Create or update wiki/meta/dashboard.md with these queries:
---
type: meta
title: "Dashboard"
updated: YYYY-MM-DD
---
# Wiki Dashboard
## Recent Activity
```dataview
TABLE type, status, updated FROM "wiki" SORT updated DESC LIMIT 15
```
## Seed Pages (Need Development)
```dataview
LIST FROM "wiki" WHERE status = "seed" SORT updated ASC
```
## Entities Missing Sources
```dataview
LIST FROM "wiki/entities" WHERE !sources OR length(sources) = 0
```
## Open Questions
```dataview
LIST FROM "wiki/questions" WHERE answer_quality = "draft" SORT created DESC
```
Canvas Map
Create or update wiki/meta/overview.canvas for a visual domain map:
{
"nodes": [
{
"id": "1",
"type": "file",
"file": "wiki/overview.md",
"x": 0, "y": 0,
"width": 300, "height": 140,
"color": "1"
}
],
"edges": []
}
Add one node per domain page. Connect domains that have significant cross-references. Colors map to the CSS scheme: 1=blue, 2=purple, 3=yellow, 4=orange, 5=green, 6=red.
Address Validation (DragonScale Mechanism 2 MVP)
Opt-in feature. Address Validation runs only if the vault is using DragonScale, detected by:
if [ -x ./scripts/allocate-address.sh ] && [ -f ./.vault-meta/address-counter.txt ]; then
DRAGONSCALE_ADDRESSES=1
else
DRAGONSCALE_ADDRESSES=0
fi
When DRAGONSCALE_ADDRESSES=0, skip this entire section. Missing address: fields are not flagged, not even informationally. Pages that happen to have an address: field are passed through unvalidated (treat as user-managed metadata).
When DRAGONSCALE_ADDRESSES=1, proceed with the rollout baseline and checks below.
Rollout baseline: 2026-04-23 (Phase 2 ship date in vaults that adopted DragonScale on that day). Vaults that adopted DragonScale later should override this baseline by setting the earliest created: date of any addressed page as their personal rollout date. Record the chosen baseline at the top of .vault-meta/legacy-pages.txt as a commented line: # rollout: YYYY-MM-DD.
Classification rule (applied per page)
Before validating anything, classify the page:
| Classification | Criteria |
|---|
| Meta / fold / excluded | File is in wiki/folds/ OR filename in {_index.md, index.md, log.md, hot.md, overview.md, dashboard.md, dashboard.base, Wiki Map.md, getting-started.md}. Address not required. |
| Post-rollout (must have address) | type is not meta/fold AND frontmatter created: date is >= 2026-04-23 AND file path is NOT in the legacy baseline manifest. |
| Legacy (backfill-eligible) | type is not meta/fold AND frontmatter created: date is < 2026-04-23 OR file path IS in the legacy baseline manifest. Address not required until backfill. |
Legacy baseline manifest: optional file at .vault-meta/legacy-pages.txt, one relative path per line. Pages listed there are treated as legacy regardless of created: date. Use this to grandfather pages whose created: metadata is wrong or missing.
Validation checks (run in order)
-
Format check: any page with address: set must match one of:
^c-[0-9]{6}$ — post-rollout creation address.
^l-[0-9]{6}$ — legacy-backfill address.
- Pages under
wiki/folds/ use fold_id, not address; do not apply the c-/l- regex there.
-
Uniqueness check: no two pages share the same address value. Report both paths.
-
Counter consistency: ./scripts/allocate-address.sh --peek returns the next counter value. Every observed c-NNNNNN must satisfy NNNNNN < peek_value. Violation = counter drift.
-
Post-rollout enforcement: every page classified as "post-rollout (must have address)" that LACKS the address: field is a lint error, not informational. This prevents the silent-regression path where a new page skips address assignment.
-
Legacy identification: every page classified as "legacy" that LACKS an address is informational. The lint report lists them under "Pending backfill" with total count.
-
Address-map consistency (.raw/.manifest.json): for every page path in address_map, the page must exist and its frontmatter address must match the mapping. Mismatches are errors (either a rename dropped the map update, or a manual edit diverged).
Lint posture summary
- Pages that HAVE an address with bad format: error.
- Pages that HAVE colliding addresses: error.
- Pages classified post-rollout WITHOUT an address: error.
- Pages classified legacy WITHOUT an address: informational (expected).
- Meta and fold pages without
address: ignored (not applicable).
- Counter drift (observed counter >= peek): error.
- Address-map mismatch: error.
Lint only observes. Do NOT auto-assign missing addresses during lint. Assignment is wiki-ingest's responsibility only.
Output section in the lint report
## Address Validation
- Counter state: `$(./scripts/allocate-address.sh --peek)`
- Highest c- address observed: c-XXXXXX
- Post-rollout pages checked: N (X passing, Y errors)
- Legacy pages pending backfill: M
### Errors
- [[Page Name]]: invalid address format `{value}`. Expected `c-NNNNNN` or `l-NNNNNN`.
- [[Page A]] and [[Page B]] share address `c-000042`.
- [[Post-Rollout Page]]: missing address. Page created 2026-04-25 (post-rollout); address required. Run wiki-ingest or manually run `./scripts/allocate-address.sh` and add to frontmatter.
- [[Page Name]] has address `c-000100` but counter peek is `50`. Counter drift; run `./scripts/allocate-address.sh --rebuild`.
- `.raw/.manifest.json` maps `wiki/foo.md` -> `c-000010` but page frontmatter has `c-000012`. Resolve mismatch.
### Pending backfill (informational)
- M legacy pages without addresses. See `.vault-meta/legacy-pages.txt` for the canonical legacy set, or filter by `created:` < 2026-04-23.
Semantic Tiling (DragonScale Mechanism 3 MVP, opt-in)
Opt-in feature. Semantic tiling flags candidate duplicate pages (not just concept pages — see Scope below) using embedding cosine similarity. Local ollama only by default; remote endpoints require an explicit override flag.
Detection and delegation
if [ -x ./scripts/tiling-check.py ] && command -v python3 >/dev/null 2>&1; then
./scripts/tiling-check.py --peek > /tmp/tiling-peek.json 2>/dev/null
PEEK_EXIT=$?
case $PEEK_EXIT in
0) TILING_READY=1 ;;
2) TILING_READY=0 ; echo "tiling ERROR: usage error (exit 2); inspect /tmp/tiling-peek.json" ;;
3) TILING_READY=0 ; echo "tiling ERROR: cache corrupt (exit 3); inspect .vault-meta/tiling-cache.json" ;;
4) TILING_READY=0 ; echo "tiling ERROR: vault exceeds scale hard-fail (exit 4); batching required" ;;
10) TILING_READY=0 ; echo "tiling skipped: ollama not reachable (exit 10)" ;;
11) TILING_READY=0 ; echo "tiling skipped: run 'ollama pull nomic-embed-text' to enable (exit 11)" ;;
*) TILING_READY=0 ; echo "tiling ERROR: unexpected exit code $PEEK_EXIT from tiling-check.py --peek" ;;
esac
else
TILING_READY=0
echo "tiling skipped: scripts/tiling-check.py or python3 not available"
fi
Inspect /tmp/tiling-peek.json (structured diagnostics: script path, python interpreter, ollama URL, cache state, thresholds state) whenever the status is ambiguous. Never collapse unknown exits into "unknown status" silently.
When TILING_READY=1:
./scripts/tiling-check.py --report wiki/meta/tiling-report-YYYY-MM-DD.md
REPORT_EXIT=$?
case $REPORT_EXIT in
0) echo "tiling report written" ;;
2) echo "tiling ERROR: usage error during --report" ;;
3) echo "tiling ERROR: cache corrupt during --report" ;;
4) echo "tiling ERROR: scale hard-fail during --report" ;;
10) echo "tiling ERROR: ollama became unreachable between --peek and --report" ;;
11) echo "tiling ERROR: model became unavailable between --peek and --report" ;;
*) echo "tiling ERROR: unexpected exit code $REPORT_EXIT from tiling-check.py --report" ;;
esac
Scope (what the helper scans)
- Includes: every
.md under wiki/ except the exclusion set below. The scope is "candidate tileable pages," not just type: concept.
- Excludes (path): anything under
wiki/folds/ or wiki/meta/.
- Excludes (filename):
_index.md, index.md, log.md, hot.md, overview.md, dashboard.md, Wiki Map.md, getting-started.md.
- Excludes (frontmatter):
type: meta or type: fold.
- Excludes (security): symlinks. Any page file that is a symlink, or whose resolved path escapes the vault root, is skipped.
If you place a real concept under wiki/meta/ it will be excluded by path regardless of content. Keep concepts in their canonical folders.
How the helper works
- Computes one embedding per included page via the ollama
nomic-embed-text model by default.
- Caches embeddings at
.vault-meta/tiling-cache.json, keyed on sha256(model + body) so model drift auto-invalidates. Frontmatter is not part of the hash or the embedding input — pure frontmatter edits (tag changes, status bumps) do not trigger recomputation.
- Orphans are GC'd: when a cached page path no longer exists on disk, its entry is dropped on save.
- Concurrent-safe: exclusive flock on
.vault-meta/.tiling.lock around cache I/O; per-PID temp file for atomic writes.
Security posture
- Defaults to
http://127.0.0.1:11434. OLLAMA_URL env override is accepted only with --allow-remote-ollama because page bodies are POSTed as embedding input.
- Symlinks and vault-root escapes are rejected.
Default bands (conservative seeds, NOT calibrated)
| Band | Similarity | Report section |
|---|
| Error | >= 0.90 | Errors — strong near-duplicate, likely the same concept |
| Review | 0.80 - 0.90 | Review — possible tile overlap; human judgement needed |
| Pass | < 0.80 | not emitted |
These values are conservative seeds, not literature-backed interpolation. Published reference points: Sentence Transformers community_detection defaults to 0.75; Quora-duplicate calibrations land around 0.7715-0.8352 depending on objective. The 0.80 review floor is already stricter than at least one cited Quora optimum, so expect false negatives against those baselines. Reduce the review floor during calibration if you want more sensitivity.
Calibration procedure (manual, one-time per vault)
- Run the helper with defaults. Capture the Review band pairs.
- Temporarily lower
bands.review to 0.70 in .vault-meta/tiling-thresholds.json to surface a wider sample. Aim for >=50 pairs spanning 0.70-0.95.
- Label each pair:
duplicate, similar, distinct.
- Pick bands such that: (a) the
error band contains >= 95% true duplicates; (b) the review band captures similar pairs without swamping the report with distinct ones.
- Edit
.vault-meta/tiling-thresholds.json: set new bands.error and bands.review, set calibrated: true, set calibration_pairs_labeled to the label count.
- Re-run lint. Report footer now says
calibrated: true.
Scale
- Cold-cache cost is O(N) POSTs to ollama. Warm-cache cost is O(N^2) cosines in pure Python.
- Helper prints a warning at > 500 pages and hard-fails (exit 4) at > 5000. Revisit the implementation (batching, vectorized cosine, or external tooling) before exceeding either limit.
Lint report embed
## Semantic Tiling
See [[tiling-report-YYYY-MM-DD]] for the full pair listing.
- Errors (>=0.90): N pairs
- Review (0.80-0.90): M pairs
- Calibrated: true|false
Invariants
- Read-only.
tiling-check.py never modifies wiki pages.
- No auto-merge. Duplicates are listed, never resolved.
- Cache is incremental and model-scoped. Unchanged pages are not re-embedded.
- Exit codes:
0 ok, 2 usage error, 3 cache corrupt, 4 scale hard-fail, 10 ollama unreachable, 11 model missing. Surface all of them; do not collapse into a single "unknown" bucket.
Before Auto-Fixing
Always show the lint report first. Ask: "Should I fix these automatically, or do you want to review each one?"
Safe to auto-fix:
- Adding missing frontmatter fields with placeholder values
- Creating stub pages for missing entities
- Adding wikilinks for unlinked mentions
Needs review before fixing:
- Deleting orphan pages (they might be intentionally isolated)
- Resolving contradictions (requires human judgment)
- Merging duplicate pages
How to think (10-principle mapping)
When working on this skill, apply the 10-principle loop. See skills/think/SKILL.md for the canonical framework.
| # | Principle | Application here |
|---|
| 1 | OBSERVE (ext) | Scan every page, every wikilink, every frontmatter block. No skipping for size or apparent obviousness. |
| 2 | OBSERVE (int) | Am I biased toward "looks fine"? Pretend you're a hostile reader looking for what's actually wrong. |
| 3 | LISTEN | Did the user mention specific concerns this session? Prioritize those over generic checks. |
| 4 | THINK | Which checks matter? Tier findings by severity (BLOCKER / HIGH / MEDIUM / LOW), not by ease of fix. |
| 5 | CONNECT (lat) | Orphan + dead-link + frontmatter-gap patterns often co-occur. Cluster findings to expose root causes. |
| 6 | CONNECT (sys) | Tiling-check + Dataview dashboards + canvas overview — multiple lint surfaces to integrate. |
| 7 | FEEL | A lint report should empower, not shame. Actionable items beat exhaustive catalog. |
| 8 | ACCEPT | Some lint findings are deliberate (orphan-by-design, intentional stub). Flag, don't force. |
| 9 | CREATE | Lint report at wiki/meta/lint-report-YYYY-MM-DD.md with tiered findings. |
| 10 | GROW | Recurring lint findings → process improvement targets, not just one-time fixes. |