with one click
extract-knowledge
Mine shortest atomic facts from (question + gold_sql) pairs into ./knowledge/*.md; either by simulating SQL drafting (lite) or by driving the gen_sql subagent in blind iteration (deep)
Menu
Mine shortest atomic facts from (question + gold_sql) pairs into ./knowledge/*.md; either by simulating SQL drafting (lite) or by driving the gen_sql subagent in blind iteration (deep)
| name | extract-knowledge |
| description | Mine shortest atomic facts from (question + gold_sql) pairs into ./knowledge/*.md; either by simulating SQL drafting (lite) or by driving the gen_sql subagent in blind iteration (deep) |
| tags | ["knowledge","sql","gold-sql","iteration"] |
| version | 1.1.0 |
| user_invocable | true |
| disable_model_invocation | false |
You receive one or more (question, gold_sql) pairs. For each pair you must:
gold_sql exactly — either by simulating the drafting yourself (lite mode) or by driving a blind gen_sql subagent (deep mode). In deep mode, never expose the gold SQL or any gold result values to the subagent.gold_sql to surface the business knowledge gap — the rules, joins, filters, granularity, or business definitions that a generic SQL agent would not know../knowledge/<domain-slug>.md, and refresh the ## Knowledge index in ./AGENTS.md.The goal is the shortest description that lets an LLM answer correctly. Matching the SQL is only the means; reusable atomic facts are the deliverable.
gold_sql to the subagent. Not the full SQL, not snippets, not concrete result values, not column-level data. The subagent must stay "blind" — only then do its mistakes reveal what knowledge is missing.session_id via the task tool — that's the only way the subagent remembers prior attempts.Resolve the pair source in this priority order (use the first one that works, do not keep looking):
read_file first, then parse./extract-knowledge without explicit pairs:
(question, SQL) — i.e. a data question the user asked, followed by an assistant SQL the user continued to refine, accepted, or did not reject.question, and the final adopted SQL as gold_sql.ask_user by showing the recovered question and a prefix of gold_sql. Do not run silently — context recovery has ambiguity risk (you may have latched onto an intermediate draft).If none of the above yields a clear (question, SQL), call ask_user to request the pair from the user.
Once the input is resolved, decide which mode to run before entering the workflow:
gen_sql subagent. The main agent itself simulates "given this question + the current datasource schema, how would I draft the SQL," diffs that mental draft against gold_sql, and mines the gap directly. Pros: fast, zero subagent calls, single-pass. Cons: no independent blind-generator validation; depends on the main agent's discipline to "pretend not to know the answer."gen_sql subagent through multi-round blind iteration (≤5 rounds) until result match, then mine facts from the final diff. Pros: independent blind validator, iteration surfaces finer-grained boundary facts. Cons: consumes subagent tokens, requires multiple read_query executions.Decision rule:
ask_user tool is available, call it once and ask the user to choose lite or deep; include the one-line tradeoff. Default suggestion = lite.ask_user is unavailable (no such tool, or running non-interactively), default to lite and note in the final output: "defaulted to lite; rerun and choose deep for stricter validation."Record the chosen mode and route to the right workflow branch.
gold_sql (both modes)Run read_query(sql=<gold_sql>).
Take this branch only when mode = lite; skip the deep branch entirely.
gold_sql. Mentally switch to a state where you only know question + the current datasource schema. Do not let the specific shape of gold_sql influence this draft.read_query-execute it; the draft is only a comparison baseline) — this is the version a SQL agent unfamiliar with this project's business conventions would produce given that schema.gold_sql along these axes:
There is no match / mismatch verdict in lite mode — diff items drive extraction directly. Lite mode does not enter the "5-round exhaustion" branch and does not write Open Gaps (no iteration history exists).
Take this branch only when mode = deep and continue with Steps 3 and 4 below.
Invoke:
task(
type="gen_sql",
prompt=<question>, # natural-language question only
description="extract-knowledge: initial attempt for <short topic>"
)
The returned envelope contains result.sql (or result.sql_file_path for long SQL), result.response, and result.session_id. Save the session_id — every later retry must reuse it.
Run the subagent's SQL via read_query and compare against the cached gold result:
read_query to run difference probes (two-way EXCEPT, key-column SUM / COUNT(DISTINCT ...) consistency checks, etc.)Verdict: match or mismatch.
Diagnose qualitatively. Common symptoms → likely missing knowledge:
| Symptom | Likely missing knowledge |
|---|---|
| Too many rows | Missing filter, missing join, wrong granularity |
| Too few rows | Extra filter, join condition too strict |
| Aggregate value off | Wrong measure column, missing dedup, wrong unit |
| Extra / missing columns | Output shape mismatch, projection rule missing |
| Time bucketing differs | Granularity / business calendar / fiscal-year rule |
| Unexpected NULLs | Wrong join type (LEFT vs INNER), missing COALESCE rule |
When designing the follow-up prompt, obey:
Then call:
task(
type="gen_sql",
session_id=<saved session_id>, # must be the previous round's id
prompt=<hint only>,
description="extract-knowledge: refine #<n>"
)
Loop Step 3 → Step 4 for at most 5 rounds (including the first). If still mismatched after round 5: stop retrying, but do not discard the process information — promote every confirmed knowledge gap (the rules the subagent finally got right each round, the business rules / field traps / mandatory filters you've already pinpointed in follow-up prompts) into facts via Step 5, then log the remaining un-aligned differences in Open Gaps via Step 7.
For every candidate diff item, ask these 4 questions in order. If any answer is "yes," drop it.
INFORMATION_SCHEMA / table comments / column names?Record atomic facts only — anything else counts as derivative and is not written. Common atomic categories worth writing:
Self-check: after writing a group of facts, look back — can you drop any single one and still let the LLM answer correctly? If yes, that one was derivative. Drop it.
Trigger:
gold_sql diff.lite mode: every diff item from Steps 2–4 (lite) is a candidate fact.
deep mode: diff the subagent's final SQL against gold_sql (also against the subagent's first attempt — the middle is informative). On failure, additionally walk through every follow-up prompt you sent; the "directions you've already qualitatively named" are confirmed fact sources.
Pass every candidate through the "worth-writing" test. Only survivors become facts.
Each fact contains only:
statement — one sentence stating the fact itself. Do not explain "why it matters." Critical warnings (boundary values, easy-to-miss traps) go in inline parenthetical notes.example (optional) — only when the one-sentence statement cannot remove ambiguity by itself; then attach a minimal SQL snippet.Forbidden:
> 7, not >= 7" and then paste a WHERE x > 7 snippet)business domain → topic → facts
(one .md) (## heading) (list items / table rows / paragraphs / optional ### subblock)
./knowledge/<domain-slug>.md. Prefer fewer, wider domains over many narrow ones.##) = a group of facts sharing context — typically the constraints of one table, a set of measure definitions, or a related mapping group.### <one-sentence rule> subblock — only when a SQL example must be attachedHard constraints:
###Derived from / Why it matters / When it applies label fields./knowledge/ and read each file's Domain intro. Prefer adding a new topic or fact under an existing domain over opening a new file.Template for a new business-domain file:
# <Domain Title>
> **Domain:** <one-sentence scope statement — what this file covers, what it doesn't>.
## <Topic Title>
<optional 1–2 sentence intro describing what this topic covers>
- fact 1
- fact 2 (note ...)
| Field / Branch | Value / Verdict |
|---------------|-----------------|
| ... | ... |
### <rule that needs a SQL example>
<one-sentence statement, may carry an inline warning>
\`\`\`sql
<minimal snippet that resolves ambiguity>
\`\`\`
For every classified fact, walk the gates in order. Do not skip steps.
./knowledge/ does not yet exist, just write_file the first file (no ask_user needed for an empty-directory case).ask_user: list existing domains, propose a new slug, let the user choose reuse / new / custom.read_file the target. Extract existing facts per topic (list items + table rows + paragraph facts + ### subblock headings). Compare each candidate against existing facts semantically (not by string):
| Outcome | Definition | Action |
|---|---|---|
| Duplicate | Same fact, same scope, same direction. | Silent skip (idempotent). Record in final report. |
| Refinement | Existing fact is a strict subset of the new one (lower precision, missing a condition, narrower scope). | ask_user: replace / merge / keep both with a scope qualifier. Default suggestion = merge. |
| Conflict | Same scope, opposite direction / mutually contradictory. | ask_user is mandatory. Show both side-by-side. Options: keep old / replace with new / keep both with an explicit conditional gate (you must spell out the condition). Never resolve a conflict silently. |
| Complementary | Same topic, different facet. | Append a same-form fact under the same topic (list item / table row / paragraph). |
| Derivable | The candidate mechanically composes from existing facts. | Drop, do not write. The "worth-writing" test (question 4) is supposed to catch this earlier — this is a safety net. |
| New topic / new domain | No relevant existing topic. | Create the heading via 6.3. |
edit_file for minimum diff.write_file the Domain intro followed by the first topic heading## <topic title> block### <one-sentence rule> subblock when a SQL example is neededDo not insert --- separators between facts — markdown headings / lists already separate them, and --- makes edit_file insertion points ambiguous.
Maintain the ## Knowledge section of ./AGENTS.md, so that later main agents handling related business-domain tasks can enter from AGENTS.md, see what knowledge/ holds, and read_file the relevant domain file as context.
AGENTS.md handling order:
./AGENTS.md missing → write_file a minimal skeleton (just # <project directory name> + the ## Knowledge section; leave the rest for /init to fill in later). Do not wait for the user to run /init first.## Knowledge section → insert after ## Artifacts (or at end of file if Artifacts is also absent)../knowledge/ and rewrite the entire ## Knowledge section sorted alphabetically by domain title — guarantees the index stays consistent across runs.Fixed structure of the ## Knowledge section:
## Knowledge
`./knowledge/` holds this project's atomic business facts, organized per business domain (maintained by `/extract-knowledge`).
Before handling a task that touches a domain, `read_file` the corresponding file for context.
- [<Domain Title A>](knowledge/<domain-slug-a>.md) — <one-sentence scope>
- [<Domain Title B>](knowledge/<domain-slug-b>.md) — <one-sentence scope>
### Open Gaps
- <question summary> — <remaining un-aligned difference>
> Domain: intro../knowledge/ is empty, omit the index list and ### Open Gaps; keep only the section intro paragraph.### Open Gaps omits entirely when there are no entries; when present, group by domain.5-round mismatch handling (deep mode only): keep the facts Step 5–6 already wrote in their domain files, then append a line - <question summary> — <remaining un-aligned difference> under ### Open Gaps, and continue with the next pair. Lite mode has no iteration history and does not write Open Gaps.
Return a single human-readable summary covering: the mode used (lite / deep); how many pairs were processed; how many matched / failed (deep) or were aligned (lite); which knowledge/*.md files were created / edited; how many facts were added; any conflicts still needing attention. When lite was used because ask_user was unavailable, add one line: "defaulted to lite; rerun and choose deep for stricter validation." Do not return a JSON envelope.
read_query(sql=...) — execute gold SQL and difference probes (deep mode additionally executes subagent SQL). In lite mode it is only used for Step 1's gold validation.task(type="gen_sql", prompt=..., session_id=...) — deep mode only; delegate SQL generation, reuse session across retries.read_file, write_file, edit_file — manage ./knowledge/*.md and ./AGENTS.md. Always read_file the target domain file before edits (6.2 depends on it).ask_user — required for: ambiguous input parsing; confirming a context-recovered pair; mode selection (lite / deep); new-domain confirmation (6.1); refinement decisions (6.2); every fact conflict (6.2). Phrase questions with numbered options.task(type="gen_sql", ...) — the mode selection already declared no subagent.task(type="gen_sql", ...) before confirming gold_sql actually runs.gen_sql session for retries — always pass the previous round's session_id../knowledge/*.md. Knowledge is atomic facts mined from the gap, not the answer itself.Derived from / Why it matters / When it applies label fields; do not include any question text in the files.ask_user — the user decides replace / coexist / scope-gate.--- separators between facts.### Rule 1, ### Note). The one-sentence rule itself is the heading.####. Deeper means the domain was sliced wrong — fold the levels or split the file.## — merge it into a wider topic.Generate MetricFlow metrics from natural language business descriptions
Initialize project workspace — generate AGENTS.md with architecture, directory map, services, and artifacts
Generate MetricFlow semantic models from database tables with validation and Knowledge Base publishing
Create database tables from SQL (CTAS) or natural language descriptions
Activate when the gen_job agent detects that the source and target databases differ. Covers cross-database transfer lifecycle - type mapping via adapter Mixin hints, DDL generation, data transfer via transfer_query_result, and lightweight reconciliation.
Execution guide for Airflow scheduled jobs — troubleshooting, updating, conn_id conventions, and cron references