with one click
extract-knowledge
Mine shortest atomic facts from (question + gold_sql) pairs into ./knowledge/*.md; either by simulating SQL drafting (lite) or by driving the gen_sql subagent in blind iteration (deep)
Menu
Mine shortest atomic facts from (question + gold_sql) pairs into ./knowledge/*.md; either by simulating SQL drafting (lite) or by driving the gen_sql subagent in blind iteration (deep)
Build the project's vector-indexed knowledge base from files plus database metadata — optionally scoped to specific files / tables / datasources / domains. Scan the in-scope material, classify it into business domains, explore each domain in parallel with explore subagents, then (after the user confirms a generation manifest) route every artifact to its store via storage-classify, generating semantic_models / metrics / reference_sql (and mining any extra knowledge), and refresh AGENTS.md's KB index. The lightweight /init handles the AGENTS.md inventory plus file-based knowledge/memory; this skill owns the heavy vector-store generation.
Lightweight project initialization — infer the project goal and in-scope datasources, scan the file tree and database metadata (db/table/desc/sample), classify into business domains, then write an AGENTS.md inventory skeleton plus the cheap file-based stores (atomic facts to ./knowledge/*.md via lite extract-knowledge, durable preferences to memory). Stops short of the expensive vector-indexed stores (semantic_models / metrics / reference_sql). Single confirmation-free pass, low token cost.
Decide where a produced artifact must be persisted before writing it, then route it the prescribed way — semantic_models / metrics / reference_sql via the matching task() subagent, knowledge via extract-knowledge (lite), memory via add_memory, skills via create-skill, and AGENTS.md edited directly. Load before persisting any business fact, validated SQL, metric/model definition, session preference, project convention, or reusable workflow.
Audit and reorganize every persistent store — semantic_models, metrics, reference_sql, knowledge, memory, AGENTS.md, skills — verifying each item sits in the correct store per storage-classify, and surfacing duplicates, misclassifications, conflicts, and stale/erroneous entries. Produce a Remediation Plan, STOP for confirmation, then execute. Use ask_user only for genuine decisions during analysis. If nothing needs fixing, report it and stop.
Review the current chat session and persist its valuable takeaways — business facts/rules, validated SQL, metric/model definitions, durable preferences, project conventions, reusable workflows — by classifying each via storage-classify and routing it to the right store. Present a Summary Manifest and STOP for confirmation before any heavy generation. Use at the end of a working session or when the user asks to capture what was learned.
Generate MetricFlow metrics from natural language business descriptions
| name | extract-knowledge |
| description | Mine shortest atomic facts from (question + gold_sql) pairs into ./knowledge/*.md; either by simulating SQL drafting (lite) or by driving the gen_sql subagent in blind iteration (deep) |
| tags | ["knowledge","sql","gold-sql","iteration"] |
| version | 1.3.0 |
| user_invocable | true |
| disable_model_invocation | false |
You receive one or more (question, gold_sql) pairs. For each pair you must:
gold_sql exactly — either by simulating the drafting yourself (lite mode) or by driving a blind gen_sql subagent (deep mode). In deep mode, never expose the gold SQL or any gold result values to the subagent.gold_sql to surface the business knowledge gap — the rules, joins, filters, granularity, or business definitions that a generic SQL agent would not know../knowledge/<domain-slug>.md, and refresh the ## Knowledge index in ./AGENTS.md.The goal is the shortest description that lets an LLM answer correctly. Matching the SQL is only the means; reusable atomic facts are the deliverable.
gold_sql to the subagent. Not the full SQL, not snippets, not concrete result values, not column-level data. The subagent must stay "blind" — only then do its mistakes reveal what knowledge is missing.session_id via the task tool — that's the only way the subagent remembers prior attempts.Resolve the pair source in this priority order (use the first one that works, do not keep looking):
read_file first, then parse./extract-knowledge without explicit pairs:
(question, SQL) — i.e. a data question the user asked, followed by an assistant SQL the user continued to refine, accepted, or did not reject.question, and the final adopted SQL as gold_sql.ask_user by showing the recovered question and a prefix of gold_sql. Do not run silently — context recovery has ambiguity risk (you may have latched onto an intermediate draft).If none of the above yields a clear (question, SQL), call ask_user to request the pair from the user.
Once the input is resolved, decide which mode to run before entering the workflow:
gen_sql subagent. The main agent itself simulates "given this question + the current datasource schema, how would I draft the SQL," diffs that mental draft against gold_sql, and mines the gap directly. Pros: fast, zero subagent calls, single-pass. Cons: no independent blind-generator validation; depends on the main agent's discipline to "pretend not to know the answer."gen_sql subagent through multi-round blind iteration (≤5 rounds) until result match, then mine facts from the final diff. Pros: independent blind validator, iteration surfaces finer-grained boundary facts. Cons: consumes subagent tokens, requires multiple read_query executions.Decision rule:
ask_user tool is available, call it once and ask the user to choose lite or deep; include the one-line tradeoff. Default suggestion = lite.ask_user is unavailable (no such tool, or running non-interactively), default to lite and note in the final output: "defaulted to lite; rerun and choose deep for stricter validation."Record the chosen mode and route to the right workflow branch.
gold_sql (both modes)Run read_query(sql=<gold_sql>).
Take this branch only when mode = lite; skip the deep branch entirely.
gold_sql. Mentally switch to a state where you only know question + the current datasource schema. Do not let the specific shape of gold_sql influence this draft.read_query-execute it; the draft is only a comparison baseline) — this is the version a SQL agent unfamiliar with this project's business conventions would produce given that schema.gold_sql along these axes:
There is no match / mismatch verdict in lite mode — diff items drive extraction directly. Lite mode does not enter the "5-round exhaustion" branch and does not write Open Gaps (no iteration history exists).
Take this branch only when mode = deep and continue with Steps 3 and 4 below.
Invoke:
task(
type="gen_sql",
prompt=<question>, # natural-language question only
description="extract-knowledge: initial attempt for <short topic>"
)
The returned envelope contains result.sql (or result.sql_file_path for long SQL), result.response, and result.session_id. Save the session_id — every later retry must reuse it.
Run the subagent's SQL via read_query and compare against the cached gold result:
read_query to run difference probes (two-way EXCEPT, key-column SUM / COUNT(DISTINCT ...) consistency checks, etc.)Verdict: match or mismatch.
Diagnose qualitatively. Common symptoms → likely missing knowledge:
| Symptom | Likely missing knowledge |
|---|---|
| Too many rows | Missing filter, missing join, wrong granularity |
| Too few rows | Extra filter, join condition too strict |
| Aggregate value off | Wrong measure column, missing dedup, wrong unit |
| Extra / missing columns | Output shape mismatch, projection rule missing |
| Time bucketing differs | Granularity / business calendar / fiscal-year rule |
| Unexpected NULLs | Wrong join type (LEFT vs INNER), missing COALESCE rule |
When designing the follow-up prompt, obey:
Then call:
task(
type="gen_sql",
session_id=<saved session_id>, # must be the previous round's id
prompt=<hint only>,
description="extract-knowledge: refine #<n>"
)
Loop Step 3 → Step 4 for at most 5 rounds (including the first). If still mismatched after round 5: stop retrying, but do not discard the process information — promote every confirmed knowledge gap (the rules the subagent finally got right each round, the business rules / field traps / mandatory filters you've already pinpointed in follow-up prompts) into facts via Step 5, then log the remaining un-aligned differences in Open Gaps via Step 7.
For every candidate diff item, ask these 4 questions in order. If any answer is "yes," drop it.
INFORMATION_SCHEMA / table comments / column names?Record atomic facts only — anything else counts as derivative and is not written. Common atomic categories worth writing:
Self-check: after writing a group of facts, look back — can you drop any single one and still let the LLM answer correctly? If yes, that one was derivative. Drop it.
Trigger:
gold_sql diff.lite mode: every diff item from Steps 2–4 (lite) is a candidate fact.
deep mode: diff the subagent's final SQL against gold_sql (also against the subagent's first attempt — the middle is informative). On failure, additionally walk through every follow-up prompt you sent; the "directions you've already qualitatively named" are confirmed fact sources.
Systematic corpus scan (both modes, additive) — mine encodings the per-pair diff misses. The diff only surfaces facts a generic agent got wrong on this question; a literal that the draft happened to guess, or that no pair exercised, slips through. So, when you have access to the validated-SQL corpus (not just the single pair), also scan its WHERE / SELECT clauses structurally for non-inferable atoms, independent of any single diff:
col = '<literal>', col IN (...), bit/flag tests) where the literal carries business meaning → a term ↔ column ↔ value fact (e.g. a business term maps to a specific column holding a specific code).SELECT/WHERE → record the mapping.> 500 vs >= 500 (and BETWEEN's inclusivity) exactly as the validated SQL has it — strict-vs-inclusive mismatches flip rows at the boundary and are among the most common silent errors.This pass extracts structure (literals in WHERE, term→column in SELECT), never hardcoded domain vocabulary — it works for any dataset. Every candidate it produces is just another candidate: funnel it through the same "worth-writing" test (so generic/inferable facts are still dropped) and the same Step 6 dedup/conflict gates (so it never duplicates what the diff already wrote).
Pass every candidate through the "worth-writing" test. Only survivors become facts.
Each fact contains only:
statement — one sentence stating the fact itself. Do not explain "why it matters." Critical warnings (boundary values, easy-to-miss traps) go in inline parenthetical notes.example (optional) — only when the one-sentence statement cannot remove ambiguity by itself; then attach a minimal SQL snippet.Forbidden:
> 7, not >= 7" and then paste a WHERE x > 7 snippet)business domain → topic → facts
(one .md) (## heading) (list items / table rows / paragraphs / optional ### subblock)
./knowledge/<domain-slug>.md. Prefer fewer, wider domains over many narrow ones.##) = a group of facts sharing context — typically the constraints of one table, a set of measure definitions, or a related mapping group.### <one-sentence rule> subblock — only when a SQL example must be attachedHard constraints:
###Derived from / Why it matters / When it applies label fields./knowledge/ and read each file's Domain intro. Prefer adding a new topic or fact under an existing domain over opening a new file.Template for a new business-domain file:
# <Domain Title>
> **Domain:** <one-sentence scope statement — what this file covers, what it doesn't>.
## <Topic Title>
<optional 1–2 sentence intro describing what this topic covers>
- fact 1
- fact 2 (note ...)
| Field / Branch | Value / Verdict |
|---------------|-----------------|
| ... | ... |
### <rule that needs a SQL example>
<one-sentence statement, may carry an inline warning>
\`\`\`sql
<minimal snippet that resolves ambiguity>
\`\`\`
For every classified fact, walk the gates in order. Do not skip steps.
./knowledge/ does not yet exist, just write_file the first file (no ask_user needed for an empty-directory case).ask_user: list existing domains, propose a new slug, let the user choose reuse / new / custom.read_file the target. Extract existing facts per topic (list items + table rows + paragraph facts + ### subblock headings). Compare each candidate against existing facts semantically (not by string):
| Outcome | Definition | Action |
|---|---|---|
| Duplicate | Same fact, same scope, same direction. | Silent skip (idempotent). Record in final report. |
| Refinement | Existing fact is a strict subset of the new one (lower precision, missing a condition, narrower scope). | ask_user: replace / merge / keep both with a scope qualifier. Default suggestion = merge. |
| Conflict | Same scope, opposite direction / mutually contradictory. | ask_user is mandatory. Show both side-by-side. Options: keep old / replace with new / keep both with an explicit conditional gate (you must spell out the condition). Never resolve a conflict silently. |
| Complementary | Same topic, different facet. | Append a same-form fact under the same topic (list item / table row / paragraph). |
| Derivable | The candidate mechanically composes from existing facts. | Drop, do not write. The "worth-writing" test (question 4) is supposed to catch this earlier — this is a safety net. |
| New topic / new domain | No relevant existing topic. | Create the heading via 6.3. |
edit_file for minimum diff.write_file the Domain intro followed by the first topic heading## <topic title> block### <one-sentence rule> subblock when a SQL example is neededDo not insert --- separators between facts — markdown headings / lists already separate them, and --- makes edit_file insertion points ambiguous.
Maintain the ## Knowledge section of ./AGENTS.md, so that later main agents handling related business-domain tasks can enter from AGENTS.md, see what knowledge/ holds, and read_file the relevant domain file as context.
AGENTS.md handling order:
./AGENTS.md missing → write_file a minimal skeleton (just # <project directory name> + the ## Knowledge section; leave the rest for /init to fill in later). Do not wait for the user to run /init first.## Knowledge section → insert after ## Artifacts (or at end of file if Artifacts is also absent)../knowledge/ and rewrite the entire ## Knowledge section sorted alphabetically by domain title — guarantees the index stays consistent across runs.Fixed structure of the ## Knowledge section:
## Knowledge
`./knowledge/` holds this project's atomic business facts, organized per business domain (maintained by `/extract-knowledge`).
Before handling a task that touches a domain, `read_file` the corresponding file for context.
- [<Domain Title A>](knowledge/<domain-slug-a>.md) — <one-sentence scope>
- [<Domain Title B>](knowledge/<domain-slug-b>.md) — <one-sentence scope>
### Open Gaps
- <question summary> — <remaining un-aligned difference>
> Domain: intro../knowledge/ is empty, omit the index list and ### Open Gaps; keep only the section intro paragraph.### Open Gaps omits entirely when there are no entries; when present, group by domain.5-round mismatch handling (deep mode only): keep the facts Step 5–6 already wrote in their domain files, then append a line - <question summary> — <remaining un-aligned difference> under ### Open Gaps, and continue with the next pair. Lite mode has no iteration history and does not write Open Gaps.
Return a single human-readable summary covering: the mode used (lite / deep); how many pairs were processed; how many matched / failed (deep) or were aligned (lite); which knowledge/*.md files were created / edited; how many facts were added; any conflicts still needing attention. When lite was used because ask_user was unavailable, add one line: "defaulted to lite; rerun and choose deep for stricter validation." Do not return a JSON envelope.
read_query(sql=...) — execute gold SQL and difference probes (deep mode additionally executes subagent SQL). In lite mode it is only used for Step 1's gold validation.task(type="gen_sql", prompt=..., session_id=...) — deep mode only; delegate SQL generation, reuse session across retries.read_file, write_file, edit_file — manage ./knowledge/*.md and ./AGENTS.md. Always read_file the target domain file before edits (6.2 depends on it).ask_user — required for: ambiguous input parsing; confirming a context-recovered pair; mode selection (lite / deep); new-domain confirmation (6.1); refinement decisions (6.2); every fact conflict (6.2). Phrase questions with numbered options.task(type="gen_sql", ...) — the mode selection already declared no subagent.task(type="gen_sql", ...) before confirming gold_sql actually runs.gen_sql session for retries — always pass the previous round's session_id../knowledge/*.md. Knowledge is atomic facts mined from the gap, not the answer itself.Derived from / Why it matters / When it applies label fields; do not include any question text in the files.ask_user — the user decides replace / coexist / scope-gate.--- separators between facts.### Rule 1, ### Note). The one-sentence rule itself is the heading.####. Deeper means the domain was sliced wrong — fold the levels or split the file.## — merge it into a wider topic.