원클릭으로
dbt-workflow
// Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML.
// Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML.
| name | dbt-workflow |
| description | Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML. |
| type | skill |
Check if .claude/notion-config.md exists in the working directory.
If config exists: Load the notion-context skill and run it. The skill
searches configured Notion pages (meeting notes, product specs, data
dictionaries) for context relevant to the current task. It returns a structured
NOTION CONTEXT block with definitions, decisions, and constraints.
After gathering context:
notion_context.md in the working
directory (the notion-verify subagent reads this after the build).If config does not exist: Skip this step — Notion is optional. Proceed to
Section 1. If the user asks about Notion context later, point them to
/notion-setup.
Extract from description: field:
ROW_NUMBER() ... <= N using a deterministic tiebreaker (add primary key to
ORDER BY). Do NOT use DENSE_RANK for filtering — it can return more than N rows.WHERE date_col = (SELECT MAX(date_col) FROM source).CAST(NULL AS DOUBLE) — see rule below.How to read YML descriptions: Descriptions tell you what the data MEANS, not what code to write. Use them to:
grid column, not qualifying position)But do NOT treat descriptions as literal computation instructions. They may describe steady-state behavior that doesn't apply on first build, or use imprecise language. After reading the description, always verify your logic against the actual source data — query the source tables to confirm which columns and values produce the expected result.
Write at top of SQL: -- EXPECTED SHAPE: <row count or formula> — REASON: <quote>
The starting database contains pre-computed reference tables with correct output.
dbt run will overwrite them. Before your first dbt run, for each target
model that already exists as a table in the database:
SELECT COUNT(*) FROM <model_name>
Record the row count in your -- EXPECTED SHAPE comment. If your rebuilt model's
row count doesn't match after dbt run, you MUST diff against this reference to
find which rows differ.
When a dbt project uses materialized="incremental" models, the project is
designed to accumulate state over multiple runs. On a first run (full refresh,
no prior state), incremental models build from scratch.
If you are writing a new model that includes period-over-period metrics (MoM, WoW, YoY) and the project has not been run incrementally before:
WHERE date_col = (SELECT MAX(date_col) FROM source)CAST(NULL AS DOUBLE) — there is no prior
aggregated state to compare against. Computing these from raw historical data
would produce values that don't match the expected first-run output.If the model SQL already exists (not a stub):
{% if is_incremental() %} block to understand the filter logic.Trust YML for: column names (exact match required), column descriptions (what each column represents), ref dependencies (what tables to join).
YML not_null tests on key/dimension columns (IDs, names, dates, categories)
imply a WHERE col IS NOT NULL filter on input data. Do NOT implement this as an
INNER JOIN — use an explicit WHERE clause. not_null on metric/aggregate columns
(counts, averages, totals) just asserts the output shouldn't be NULL — don't filter
inputs for those, fix the aggregation instead.
Do NOT trust YML for: grain/row count. YML unique and not_null tests are
assertions that may be aspirational or wrong. Do NOT use not_null tests to decide
join type.
Derive the grain from these signals (in priority order):
Unique key structure: If the YML defines a unique key or surrogate key column,
examine what it's composed of. A key like concat(ticker, timestamp) means the
grain is (ticker, timestamp) — not (ticker, date). The key tells you exactly
what combination of values identifies one row.
Column list: The columns themselves reveal the grain. If a model has both a header-level key AND a detail-level key as separate columns, the grain is at the detail level.
Upstream model grain: Check existing upstream models that feed into yours.
If bar_executions produces one row per (ticker, timestamp), your model that
depends on it likely has the same or coarser grain — not finer.
Source cardinality: Before writing SQL, query the source tables to check
how many rows your model should produce:
SELECT COUNT(DISTINCT key_col) FROM source_table
If your model produces dramatically fewer rows than upstream, your GROUP BY
is too coarse.
Sibling model row counts: Check complete models at the same level.
Do NOT deduplicate with ROW_NUMBER to force a unique test to pass — if the
data naturally has multiple rows per key, keep them all.
Gather business context from Notion before dbt builds. Searches pages, extracts definitions/decisions/constraints, writes structured context for the build agent and notion-verify subagent.
Load when dbt run or dbt parse fails. Covers YML duplicate patches, ref errors, passthrough model warnings, current_date fixes, DuckDB error messages, and zero-row diagnosis.
Load at Step 4 when writing SQL models. Covers column naming, type preservation, JOIN defaults, lookup joins, sibling models, materialization, packages, and filtering rules.
Load when hitting DuckDB syntax errors or writing DuckDB-specific SQL. Covers gotchas that differ from PostgreSQL/MySQL.
BigQuery-specific SQL patterns: UNNEST for array expansion, STRUCT, ARRAY_AGG, DATE_DIFF/DATE_ADD, backtick-quoted table references, EXCEPT/REPLACE in SELECT, approximate aggregation, partitioned and wildcard tables.
Use this skill before writing any SQL query. Covers: output shape inference (cardinality clues from the question), efficient schema exploration, iterative CTE-based query building, structured verification loop (row count, NULL audit, fan-out check, sample inspection), error recovery protocol, saving output to result.sql and result.csv, turn budget management, and common benchmark traps.