| name | discover-and-curate |
| description | Find related entries to a seed file, build reading lists, and surface neighbour clusters Marginalia has discovered automatically. Use when the user is browsing rather than asking a specific question. |
Discover and curate
Marginalia runs background "tend" passes that mine the corpus for
relations: tag overlap, citation graph, semantic neighbours. This skill
explains how to surface those relations from the CLI when the user
wants to explore rather than search.
When to use
- The user has one file in mind and asks "what else is like this?"
- The user wants to build a reading list around a topic.
- The user asks "what is Marginalia learning about my corpus?"
Prerequisites
- Ingestion has settled (
N busy near zero in the prompt). Discovery
works against files that already have summaries + tags + sections.
- A few "tend" cycles have run. New corpora have sparse relations until
the miners have had a chance to walk the graph.
Workflow
1. Find a seed entry
Either via search:
/search consensus protocols
Or by remembering an entry_id from a prior /info / /discover (tab
completion suggests prefixes once they're in this session's cache).
2. Discover related entries
/discover <entry_id>
Output: scored neighbours with a bar chart, sorted by relevance. A *
in the leading column flags a direct edge (citation, explicit relation)
versus a random-walk-derived neighbour.
/discover <entry_id> --all
By default discovery only returns relations the LLM has vetted. Pass
--all when the user wants the raw mining output too — useful for
spotting clusters that haven't been quality-gated yet.
3. Drill in
For each neighbour the user finds interesting:
/info <neighbour_entry_id>
The summary + section preview are usually enough to decide whether
to read the full file. If yes:
/download <neighbour_entry_id>
4. Trigger a fresh mining pass (optional)
If the user just ingested a lot of new files and wants the relation
graph updated immediately:
/tend
This kicks off a maintenance run: mining, vetting, normalization.
Returns a tend_run_id and a list of queued tasks. Watch the prompt's
N busy count to see when it settles.
/tend <tend_run_id>
Reports the status of that specific run.
Curation patterns
Reading list around a paper
/discover <seed_id> — get top-K neighbours.
- For each that looks promising:
/info <neighbour>. Read summary.
- Note the entry_ids that pass muster. (Tab completion remembers them
for the rest of the session.)
/download <id> for each, or zip the parent folder if they all live
under one tree: /download <folder_id> reading-list.zip.
Mapping a topic
/search <broad term> — surfaces top matches.
- Pick the most central-looking result as a seed.
/discover <seed> — branch out one layer.
- From the discovered set, pick a second seed in a different cluster.
- Compare the two
/discover outputs. Files that show up in both are
genuine bridges between the clusters.
Common pitfalls
-
Empty discovery results on a new corpus. Mining miners haven't
run yet. Either wait for the periodic tick, or run /tend once.
-
All neighbours are direct edges. The seed is poorly indexed
(short summary, missing tags) so random-walk can't find paths.
Re-ingesting (/ingest <path>) re-runs extraction with the current
pipeline, which usually fills these in.
-
Repeated noise in unvetted results. That's why vetting exists.
Drop --all and let the LLM filter.