| name | grounding |
| description | Decomposes the current plan, idea, or technical claim into atomic, individually-checkable assumptions and pressure-tests each against real evidence — project files, registered SME repos, available MCPs, CLIs, and live docs — instead of letting them ride into implementation. Iterative — walks the decision tree one branch at a time, probing before asking, and optionally writes resolved decisions into `CONTEXT.md` / `docs/adr/`. Use PROACTIVELY whenever the user is mapping out an approach, sketching a plan, deciding on architecture, or making claims about external systems (APIs, libraries, services, schemas), even if they haven't asked to verify anything. Trigger on planning language ("I think", "we should", "we'll need", "let's just", "the X does Y", "this works because Z") or explicit invocations ("ground this", "pressure-test this", "is this real", "what am I missing", "stress-test this plan"). Returns a structured resolved/open/contradictions report. |
| version | 0.4.2 |
Grounding
Plans built on wrong assumptions create the worst class of refactor — the kind discovered only after the work is mostly done. This skill probes the assumptions in the current discussion against real evidence (codebase, MCPs, CLIs, registered repo SMEs, docs) and surfaces what holds, what doesn't, and what's still unclear, iteratively — walking down the decision tree branch by branch — so the plan can be corrected while it's still cheap, AND so the surfaced decisions persist in the project's own documentation.
The goal is not to be exhaustive. It's to find the load-bearing assumptions, verify them, and capture what was learned.
Usage
/grounding [idea to ground] [--deep] [--background]
[idea to ground] (string, optional) — the plan, claim, or approach to pressure-test. When omitted, grounds the most recent plan, claim, or proposed approach in the conversation.
--deep (flag, optional) — force delegation to the grounding-investigator agent (out-of-thread probing) even for a small decision tree. Also triggered by phrasing like "deeply" or "really pressure-test this".
--background (flag, optional) — run the investigator in the background and report when it lands, rather than blocking the thread.
How grounding works
Grounding follows the grill-with-docs protocol (canonical at docs/protocols/grill-with-docs.md at the marketplace root; a copy ships with this skill at references/grill-with-docs.md). Read that file first — it owns the seven-phase mechanic:
- Restate the plan or idea in 1-2 sentences
- Decompose, then map the decision tree — split compound claims into atomic ones, state each in falsifiable IF/THEN form, order branches by how load-bearing they are (downstream fan-out × blast-radius-if-wrong)
- Probe before asking — try to resolve from evidence (codebase, SME repos, MCPs, CLIs, docs) before going to the user; a branch is resolved only when its chain bottoms out in a real artifact, not another unverified claim
- Ask one question at a time — only for branches that genuine intent/domain/constraint queries can resolve; always include a recommended answer
- Sharpen vocabulary along the way — challenge terms against
CONTEXT.md, propose canonical names, cross-reference user claims with code
- Write decisions where they belong —
CONTEXT.md for terms, docs/adr/ for hard-to-reverse decisions, inline as they crystallise
- Report in a compact resolved / open / contradictions structure
This skill is the grounding-specific layering on top of that protocol — what evidence sources to rank, what assumption shapes to look for, and how to handle no-source-available cases.
When this skill runs
Trigger proactively when the discussion shows any of these signals:
- Planning language — "I think we should...", "we'll need to...", "the approach is...", "let's just...", "I'll wire X to Y..."
- Claims about external systems — "the Linear API supports cycles", "Resend allows webhook signing", "the library handles retry"
- Claims about the codebase — "we already have a User model", "the auth middleware does X"
- Architectural assertions — "we can run this on a single instance", "Postgres handles this fine at our scale"
- A plan-mode plan is being finalized — verify before approval
- Explicit invocation —
/ground, "ground this", "pressure-test this", "what am I missing", "is this real", "grill me", "stress-test this"
Do not run when:
- The user is asking a direct factual question — just answer it, do not turn it into an investigation
- The user is mid-edit on uncontroversial code
- The same plan has already been grounded earlier in this session and hasn't materially changed
- The user is brainstorming divergent options — wait until they converge on a direction
- The idea is small enough that probing is more expensive than just trying it
Grounding-specific layer
The protocol gives you the mechanic. The pieces below are what makes the mechanic effective for the kind of grounding this plugin does — verifying technical claims against real-world evidence, fast.
Start with a source inventory
Before probing individual branches, run the bundled inventory script once to learn what evidence this environment actually offers:
bash scripts/inventory-sources.sh
It reports, in a single pass: which probe CLIs are installed (gh, linear, az, node, git, …), what package.json scripts the project exposes, whether the project has a CONTEXT.md / CONTEXT-MAP.md / docs/adr/, whether any SME repos are registered under ~/.claude/repo-sme/, and whether the user's global (~/.claude/grounding/tripwires.md) or project (<repo>/.claude/grounding/tripwires.md) tripwire files have live entries.
Probing from a known inventory beats rediscovering sources branch by branch — and it tells you up front which branches are headed for ⚠ No source available, so you don't burn a probe finding that out. The script is filesystem-only by design: MCP tools aren't visible to it, so check the session's tool list for those yourself. Its output is a working aid — read it, act on it, but never paste it into the report.
Evidence-source ranking
When Phase 3 of the protocol says "probe before asking," grounding ranks sources in this order:
| Rank | Source | When to reach for it |
|---|
| 1 | Skills listed in the system reminder (repo-sme:*, microsoft-docs:*, azure:*, project-manager:*) | First-class probes — invoke them rather than reinventing |
| 2 | MCP tools wired into this session (Linear, Figma, Slack, Azure, Microsoft Learn) | Use ToolSearch to load schemas before calling |
| 3 | CLI tools — gh, linear, az, project-local CLIs in package.json scripts | command -v <tool> to probe availability |
| 4 | Registered SME repos at ~/.claude/repo-sme/ | Cloned external library sources, or use the repo-sme-expert agent |
| 5 | Project filesystem — Read, Grep, Glob | For codebase-state claims |
| 6 | Git history — git log, git blame, git grep | For past-decision claims |
| 7 | Web — WebFetch / WebSearch | Last resort. Local sources are more authoritative |
If a critical assumption has no available source, surface that as part of the report (see "No source available" below) rather than silently skipping it.
Assumption taxonomy
Grounding-shaped claims tend to fall into a small set of categories. Use this table to make sure you're not missing one when mapping the decision tree in Phase 2 of the protocol:
| Category | Example | Where to verify |
|---|
| External API behavior | "GitHub's API returns commits in author-date order" | MCP / CLI / SME repo / docs |
| Library / SDK semantics | "nodemailer supports OAuth2 token refresh" | SME repo / source / docs |
| Codebase state | "we already have a Subscription model" | Read / Grep / git log |
| Schema / data shape | "users have at most one billing address" | Schema files / data sample |
| Past attempt history | "we tried this approach last quarter" | git log / PR history / project notes |
| Constraint validity | "we can't change the database engine" | User / docs / decision records |
| Compatibility | "this works with Node 20" | package.json engines / runtime docs |
| Domain rule | "every order has a customer" | User / domain docs / CONTEXT.md |
| Capacity / scale | "Postgres handles this at our load" | Metrics dashboard / past benchmarks |
| Integration availability | "the Slack workspace has the right scopes" | MCP test call / admin |
Personal tripwires
Tripwires capture patterns this user has been burned by before. Treat tripwire matches as priority branches — verify them even when they look unimportant.
The live tripwire list lives outside the plugin tree so it survives plugin updates — read the union of two user-owned files, whichever exist:
~/.claude/grounding/tripwires.md — global, personal, cross-project patterns (e.g. "claims about which <SaaS> tier supports <feature>").
<repo>/.claude/grounding/tripwires.md — project-specific patterns (e.g. "claims about our billing service"). <repo> is the git root, or the cwd if not in a repo.
In each file, only entries below the --- marker are live. Do not treat anything in this SKILL.md — or in the bundled references/tripwires.example.md — as a live tripwire; the example file is a seed/template that ships with the plugin and is never read as live (it's the thing that would get clobbered on update, which is exactly why the live list moved out).
Creating the files lazily. When the user surfaces a new tripwire, write it to the file matching its scope (global for cross-project patterns, project for repo-specific ones), creating the file if absent. Seed a new file with a minimal header ending in a single-line live marker so the inventory counter stays accurate:
# Personal tripwires
Patterns I've been burned by. The grounding skill reads entries below the `---` as live priority probes.
---
<!-- Live tripwires below. -->
- **"<pattern>"** — *why it bites* / *where to verify*
Examples of well-formed tripwire patterns (illustrative — these are shapes, not live tripwires):
- "Claims about which
<SaaS> tier supports <feature>" — pricing-page changes constantly; verify against current docs
- "Claims about default behavior in
<CLI tool>" — defaults change between versions; verify against installed version
- "Claims that
<library> handles <edge case> automatically" — usually doesn't; verify against source or test
- "Architectural claims about
<internal system>" — verify against current code, not memory
Iterative branching — what makes grounding different from a one-shot audit
The protocol's Phase 2 ("decompose, then map the decision tree") and Phase 4 ("ask one question at a time") are the heart of how grounding works. Decompose first — a compound claim mapped as a single branch hides the false atom inside it. Then don't dump every uncertainty on the user at once: pick the branch that's most load-bearing — biggest downstream fan-out and biggest blast radius if wrong — and resolve it first, by probe if possible, by single user question if not. Then condition the next question on what you just learned.
A grounding run that asks five questions in parallel finds five disconnected answers. A grounding run that asks one question at a time finds the tree's actual structure.
Deep grounding — delegate probing to the investigator agent
Deep grounding is expensive in context, not just time. Reading whole SME-repo files, sweeping the codebase with broad greps, fetching doc pages, making repeated CLI and MCP calls — all of that tool output lands in whichever thread runs it. Run it inline and the main orchestration thread fills with probe scratchpad, exactly the noise this skill's report is built to suppress. The grounding-investigator agent (agents/grounding-investigator.md) is the structural fix: it runs the probing out-of-thread and hands back only the per-branch verdicts, so depth goes up while the main thread's context stays clean.
Delegate only when the ground is deep. The point of grounding is that it's cheap to invoke — a one- or two-branch check should stay inline and finish in seconds; spinning up a subagent for it just adds latency. Delegate when the probing itself is heavy:
- Three or more load-bearing branches need real probes (not obvious-from-context answers), or
- One or more branches need an expensive source — reading SME-repo sources, fetching external docs, broad multi-file grep sweeps, or several MCP/CLI round-trips, or
- The user explicitly asked to go deep (
/ground --deep, "deeply ground this", "really pressure-test this").
If none of these hold, just probe inline as before. The investigator is a depth tool, not the default path.
Map the tree first, then dispatch. Delegation happens at the Phase 2 → Phase 3 boundary: you decompose, map the decision tree, and order branches in the main thread (that's the cheap, judgment-heavy part), then hand the mapped tree to the investigator to probe. Don't delegate a vague idea — delegate a structured list of falsifiable branches.
Pass the investigator: the restated idea, the ordered decision tree (atomic claims in IF/THEN form), and any tripwire matches flagged as priority probes. It runs its own source inventory.
Dispatch mode — auto-select:
- Blocking (default) when the report is needed before the next step — the usual case. Dispatch the investigator, wait, and fold its findings into your report. The main thread blocks but stays clean: only the verdicts come back, not the probe trail.
- Background when grounding is incidental to what the user is actively doing — they want depth but also want to keep moving. Run the investigator in the background and surface the report when it lands. Reach for this only when there's genuinely other work to do meanwhile; otherwise blocking is simpler and the result is identical.
What comes back, and what stays yours. The investigator is probe-only and read-only by design, which leaves two jobs explicitly with the main thread:
- The interactive Q&A. The investigator can't ask the user anything — it returns user-only branches as
⊘ Open items, each with the exact question and a recommended answer. You run the one-question-at-a-time loop on those (Phase 4), conditioning each question on the last answer. This is why probing is delegated but questioning is not: the tree's interactive structure only emerges in a live thread.
- Decision capture. The investigator never writes files; it only flags findings as capture-worthy ("glossary-shaped", "candidate ADR"). You do the actual
CONTEXT.md / docs/adr/ writes (Phase 6) on the findings that clear the bar.
Then render the report yourself (Phase 7) — the groundedness line, the one-line verdict, the recommended pivot. The investigator hands you raw findings; the report shape, the honesty math, and the decisiveness stay with the main thread.
Decision-capture defaults
Per Phase 6 of the protocol, when a decision crystallises during a grounding session, write it where it belongs. Grounding's defaults:
- A glossary-shaped resolution ("X means Y in this codebase, not Z") → update
<repo>/CONTEXT.md (or the matching scoped CONTEXT.md if a CONTEXT-MAP.md exists). Create the file if missing.
- A hard-to-reverse architectural decision passing the three-criteria test (hard to reverse + surprising without context + result of a real trade-off) → offer to write
docs/adr/<N>-<title>.md using the ADR template in the protocol. Offer — don't auto-write.
- A claim resolution that doesn't need to outlive the conversation → just state it in the report. Don't pollute the docs for ephemeral findings.
When in doubt, don't write. A CONTEXT.md cluttered with single-use findings is worse than one that's sparse. The bar for writing is "the next person reading the codebase needs this to make sense of it."
Report shape — grounding's specifics
The protocol's Phase 7 gives a base report shape. Grounding adds three specifics on top:
- A groundedness line under the restatement —
<N> of <M> load-bearing claims verified against evidence. It makes the report's honesty quantitative: an unsourced claim counts as unverified, never as assumed true, so a plan that reads 2 of 6 is shown as the open question it is rather than handed a confident green light.
- A "⚠ No source available" section, distinct from "⊘ Still open." Open = a probe could resolve it (you just haven't done one). No source = no probe in this environment can resolve it as currently configured. The two have different next-actions, so they need separate sections — and every no-source entry must name the cheapest probe that would close it: a 10-minute throwaway spike, a single API call, installing a CLI, registering an SME repo. "No source" on its own strands the user; "no source — but a 5-minute spike calling the endpoint settles it" hands them the next move. Verification cost should never exceed the cost of being wrong, so favour the cheapest probe that resolves the question — a quick spike usually beats reading three doc pages.
- A "→ Recommended pivot" section when the plan is fully invalidated by the findings. One or two concrete alternative paths the user could take instead. "This is broken" without an alternative leaves the user stuck.
Final shape:
## Grounding report
**Idea:** <one-sentence restatement>
**Groundedness:** <N> of <M> load-bearing claims verified against evidence
### ✓ Resolved
- <branch> — <decision> — <source>
→ Captured in: <CONTEXT.md / docs/adr/0042-... / commit msg / none>
### ⊘ Still open
- <branch> — <what's blocking> — <recommended next probe>
### ⚠ No source available
- <branch> — <cheapest probe that would close it: a 10-min spike / one-off API call / install X / register SME repo Y>
### ⚠ Contradictions found
- <claim> vs <evidence> — <recommended correction>
### → Recommended pivot (only when the plan is fully invalidated)
- <one or two concrete alternative paths>
Skip empty sections. If everything checks out, say so plainly: "All N branches resolved. Plan is grounded. Proceed."
The report is findings, not process. It is the shape above and nothing else — no decomposition table, no probe log, no inventory dump, no phase-by-phase narration. The atomic claims appear only as the one-line-each entries inside the sections; decomposition is something you did, not a section you render. A grounding report that runs past one screen (~40–50 lines) has failed its job no matter how thorough the underlying work — the value is in the cut, not the volume. The worked examples below are the length to aim for.
When the verdict is unambiguous
When the plan is clearly grounded or clearly invalidated, lead with a one-line verdict before the structured sections:
- "This plan does not work as stated." (above a contradictions-heavy report)
- "Plan is grounded. Proceed." (above a fully-resolved report)
Skip the verdict line when the picture is mixed — the structured sections speak for themselves.
Worked examples
Example: a quick ground with doc capture
User: "Let's add a daily cron that re-fetches the Linear issue cache and emails me a diff."
Restated: "Run a cron that calls Linear for current issues, diffs against yesterday's snapshot, and emails the result."
Decision tree (Phase 2):
- Where does cron run? — unlocks: what scheduling primitives are available, what email transport works
- Linear API: can we list all issues for a team without hitting a hard pagination cap? — unlocks: do we need cursor handling
- Snapshot mechanism: do we already have one? — unlocks: schema for the diff
- Email transport: what's wired in? — unlocks: how the email is sent
Probes (Phase 3 — before asking):
- (3)
git grep snapshot → finds pm-session-end.js writes daily snapshots → resolved (no question needed)
- (4)
command -v resend succeeds + ~/.claude/resend-cli/config.json exists → resolved
- (2)
repo-sme-expert query against linear-api → confirms cursor pagination, no hard cap → resolved
- (1) no source can answer → ask user
Ask one question (Phase 4):
"Where should the cron run? (a) Claude scheduled agent — easy but only when your machine's awake, (b) GitHub Actions on a schedule, (c) host crontab — needs a long-running machine. I'd lean (a) since this is personal automation. Confirm or override?"
Capture (Phase 6): the answer goes in the resulting issue body / PR description (not CONTEXT.md — this isn't a glossary term).
Report (Phase 7):
**Plan is grounded.** Daily Linear-diff email via cron.
**Groundedness:** 4 of 4 load-bearing claims verified against evidence.
### ✓ Resolved
- Cron host — Claude scheduled agent (user confirmation).
- Linear API supports paginated team-issue listing (linear-api SME repo).
- Daily snapshot already captured by project-manager — `cache/<slug>/snapshots/daily/`.
- Resend CLI configured — `resend --version` succeeds; per-project config present.
Example: a no-op ground
User: "What's the syntax for a Prisma @@unique constraint?"
This is a direct factual question, not a plan. Skip grounding entirely. Answer the question.
Example: invalidating ground with pivot
User: "I'll wire the Resend webhook to fire on every email open so we can trigger drip-sequence steps."
Decompose (Phase 2): four atomic claims — (1) Resend exposes a webhook, (2) it carries an "opened" event, (3) that event fires for every open, (4) the project has open-tracking on. The plan reads as one idea; it's four claims, and the load-bearing one is (3).
Probe (Phase 3): Resend SME repo / docs → webhook events are email.delivered, email.bounced, email.complained, email.opened. email.opened requires open-tracking enabled per-domain AND only fires for HTML emails with the tracking pixel. Project's resend config has track_opens: false — so (1) and (2) hold, (3) is false, (4) is false.
Report:
**This plan does not work as stated.**
**Groundedness:** 2 of 4 atomic claims verified — and the two that failed are load-bearing.
### ⚠ Contradictions found
- "fires on every email open" vs Resend behavior: `email.opened` requires
open-tracking enabled per-domain, only fires for HTML emails with the
pixel, and is unreliable across email clients that block remote images.
### → Recommended pivot
- (a) Enable open tracking per-domain and accept its caveats — many email
clients block the pixel, so steps will mis-fire.
- (b) Trigger drip-sequence steps on `email.delivered` instead — fires
reliably, but reflects "we sent it" not "they read it."
I'd lean (b) — open-tracking is unreliable enough that drip-sequence logic
shouldn't depend on it.
What good grounding looks like
- Cheap to invoke — runs in seconds for simple ideas, scales up only when warranted
- One branch at a time — never asks five parallel questions; resolves the load-bearing branch first
- Specific — names assumptions in the user's own terms, not generic "have you considered scale"
- Atomic — compound claims are split until each line is a single true/false assertion
- Falsifiable — every claim is phrased as an IF/THEN check a probe could fail
- Unverified by default — an unsourced claim is unverified, never assumed true; the groundedness line counts only what a probe confirmed
- Honest about gaps — says "no source for this" rather than guessing, and names the cheapest probe that would close it
- Doc-aware — sharpens vocabulary against
CONTEXT.md and writes captured decisions where they belong
- Brief — one-screen report, not an essay
- Decisive — calls a plan grounded or invalidated; doesn't hedge
- Context-clean at depth — heavy probing is delegated to the
grounding-investigator agent, so the main thread sees verdicts, not probe scratchpad
Anti-patterns
- Probing every claim — most are obvious; pick the load-bearing branches
- Asking the user things you can probe — wastes their time and signals laziness
- Asking parallel questions — pick the branch whose answer unblocks the most, ask that one
- Reaching for the web first — local sources (codebase, SME repos, MCPs) are more authoritative
- Restating the plan as the report — the report is findings, not a recap
- "Looks good to me" — if you didn't probe, say you didn't probe
- Triggering on brainstorms — wait until the user converges on an option
- Triggering twice on the same plan — once is enough unless the plan materially changes
- Writing every finding to
CONTEXT.md — the bar is high. Only write what the next reader genuinely needs.
- Forcing an ADR — three-criteria test (hard to reverse + surprising + real trade-off); fail any one, no ADR
- Marking a claim resolved on another unverified claim — chains must bottom out in a real artifact; "resolved because B holds" while B is still open is not resolved, it just looks it
- Leaving a "no source" branch with no next move — always name the cheapest probe that would close it; a bare "no source" strands the user
- Putting working notes in the report — the decomposition table, the probe log, the inventory-script output, the phase-by-phase narration: all scratchpad. The report carries verdicts, not the trail that produced them. Past ~one screen, it has failed regardless of how good the underlying grilling was.