com um clique
dbt-workflow
// Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML.
// Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML.
Gather business context from Notion before dbt builds. Searches pages, extracts definitions/decisions/constraints, writes structured context for the build agent and notion-verify subagent.
Load when dbt run or dbt parse fails. Covers YML duplicate patches, ref errors, passthrough model warnings, current_date fixes, DuckDB error messages, and zero-row diagnosis.
Load at Step 4 when writing SQL models. Covers column naming, type preservation, JOIN defaults, lookup joins, sibling models, materialization, packages, and filtering rules.
Load when hitting DuckDB syntax errors or writing DuckDB-specific SQL. Covers gotchas that differ from PostgreSQL/MySQL.
BigQuery-specific SQL patterns: UNNEST for array expansion, STRUCT, ARRAY_AGG, DATE_DIFF/DATE_ADD, backtick-quoted table references, EXCEPT/REPLACE in SELECT, approximate aggregation, partitioned and wildcard tables.
Use this skill before writing any SQL query. Covers: output shape inference (cardinality clues from the question), efficient schema exploration, iterative CTE-based query building, structured verification loop (row count, NULL audit, fan-out check, sample inspection), error recovery protocol, saving output to result.sql and result.csv, turn budget management, and common benchmark traps.
| name | dbt-workflow |
| description | Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML. |
| type | skill |
Check if .claude/notion-config.md exists in the working directory.
If config exists: Load the notion-context skill and run it. The skill
searches configured Notion pages (meeting notes, product specs, data
dictionaries) for context relevant to the current task. It returns a structured
NOTION CONTEXT block with definitions, decisions, and constraints.
After gathering context:
notion_context.md in the working
directory (the notion-verify subagent reads this after the build).If config does not exist: Skip this step — Notion is optional. Proceed to
Section 1. If the user asks about Notion context later, point them to
/notion-setup.
Extract from description: field:
ROW_NUMBER() ... <= N using a deterministic tiebreaker (add primary key to
ORDER BY). Do NOT use DENSE_RANK for filtering — it can return more than N rows.WHERE date_col = (SELECT MAX(date_col) FROM source).CAST(NULL AS DOUBLE) — see rule below.How to read YML descriptions: Descriptions tell you what the data MEANS, not what code to write. Use them to:
grid column, not qualifying position)But do NOT treat descriptions as literal computation instructions. They may describe steady-state behavior that doesn't apply on first build, or use imprecise language. After reading the description, always verify your logic against the actual source data — query the source tables to confirm which columns and values produce the expected result.
Write at top of SQL: -- EXPECTED SHAPE: <row count or formula> — REASON: <quote>
The starting database contains pre-computed reference tables with correct output.
dbt run will overwrite them. Before your first dbt run, for each target
model that already exists as a table in the database:
SELECT COUNT(*) FROM <model_name>
Record the row count in your -- EXPECTED SHAPE comment. If your rebuilt model's
row count doesn't match after dbt run, you MUST diff against this reference to
find which rows differ.
When a dbt project uses materialized="incremental" models, the project is
designed to accumulate state over multiple runs. On a first run (full refresh,
no prior state), incremental models build from scratch.
If you are writing a new model that includes period-over-period metrics (MoM, WoW, YoY) and the project has not been run incrementally before:
WHERE date_col = (SELECT MAX(date_col) FROM source)CAST(NULL AS DOUBLE) — there is no prior
aggregated state to compare against. Computing these from raw historical data
would produce values that don't match the expected first-run output.If the model SQL already exists (not a stub):
{% if is_incremental() %} block to understand the filter logic.Trust YML for: column names (exact match required), column descriptions (what each column represents), ref dependencies (what tables to join).
YML not_null tests on key/dimension columns (IDs, names, dates, categories)
imply a WHERE col IS NOT NULL filter on input data. Do NOT implement this as an
INNER JOIN — use an explicit WHERE clause. not_null on metric/aggregate columns
(counts, averages, totals) just asserts the output shouldn't be NULL — don't filter
inputs for those, fix the aggregation instead.
Do NOT trust YML for: grain/row count. YML unique and not_null tests are
assertions that may be aspirational or wrong. Do NOT use not_null tests to decide
join type.
Derive the grain from these signals (in priority order):
Unique key structure: If the YML defines a unique key or surrogate key column,
examine what it's composed of. A key like concat(ticker, timestamp) means the
grain is (ticker, timestamp) — not (ticker, date). The key tells you exactly
what combination of values identifies one row.
Column list: The columns themselves reveal the grain. If a model has both a header-level key AND a detail-level key as separate columns, the grain is at the detail level.
Upstream model grain: Check existing upstream models that feed into yours.
If bar_executions produces one row per (ticker, timestamp), your model that
depends on it likely has the same or coarser grain — not finer.
Source cardinality: Before writing SQL, query the source tables to check
how many rows your model should produce:
SELECT COUNT(DISTINCT key_col) FROM source_table
If your model produces dramatically fewer rows than upstream, your GROUP BY
is too coarse.
Sibling model row counts: Check complete models at the same level.
Do NOT deduplicate with ROW_NUMBER to force a unique test to pass — if the
data naturally has multiple rows per key, keep them all.