| name | dbt-unit-tests |
| description | Generate dbt unit tests automatically for any model. Analyzes SQL logic (CASE/WHEN, JOINs, window functions, NULLs), creates type-correct mock inputs from manifest schema, and assembles complete YAML. Use when a user says "generate tests", "add unit tests", "test this model", or "test coverage" for dbt models. |
dbt Unit Test Generation
Requirements
Agent: builder or migrator (requires file write access)
Tools used: dbt_unit_test_gen, dbt_manifest, dbt_lineage, altimate_core_validate, altimate_core_testgen, bash (runs altimate-dbt commands), read, glob, write, edit
When to Use This Skill
Use when the user wants to:
- Generate unit tests for a dbt model
- Add test coverage to an existing model
- Create mock data for testing
- Test-driven development (TDD) for dbt
- Verify CASE/WHEN logic, NULL handling, JOIN behavior, or aggregation correctness
- Test incremental model logic
Do NOT use for:
- Adding schema tests (not_null, unique, accepted_values) -> use
dbt-test
- Creating or modifying model SQL -> use
dbt-develop
- Writing descriptions -> use
dbt-docs
- Debugging build failures -> use
dbt-troubleshoot
The Iron Rules
- Never guess expected outputs. Compute them by running SQL against mock data when possible. If you cannot run SQL, clearly mark expected outputs as placeholders that need verification.
- Never skip upstream dependencies. Every ref() and source() the model touches MUST have a mock input. Miss one and the test won't compile.
- Use sql format for ephemeral models. Dict format fails silently for ephemeral upstreams.
- Never weaken a test to make it pass. If the test fails, the model logic may be wrong. Investigate before changing expected values.
- Compile before committing. Always run
altimate-dbt test --model <name> to verify tests compile and execute.
- Mock data MUST exercise the failure modes of every SQL construct in the model. A unit test that only covers the happy path validates that the model handles easy inputs — it does not validate correctness. Before writing
given: rows, list every SQL construct in the model and the boundary case it can mishandle, then ensure at least one mock row triggers each. Universal cases to always cover when the construct appears:
LEFT JOIN / LEFT OUTER JOIN → at least one parent row with no matching child (catches COUNT(*) phantom rows, SUM over NULL, fan-out / dropout)
INNER JOIN → at least one parent row whose child is filtered out by the JOIN condition (catches missing rows)
COUNT(*) / COUNT(<col>) → row where the counted column is NULL (catches COUNT(*) vs COUNT(col) divergence)
NULLIF(x, y) → row where x = y (so the result is NULL, exercising downstream NULL-handling)
/ division → row where the denominator is 0 or NULL
CASE WHEN → at least one row matching each branch, including the implicit ELSE NULL if no explicit ELSE is set
COALESCE / IFNULL → row where every argument is NULL
- Window functions (
OVER) → a partition of size 1 (single-row group exercises rank/first/last edge cases), a row at the partition boundary, and a tie-break row (two rows with the same ORDER BY key)
- Date arithmetic / date spines → a row at the start of range, end of range, and a gap day with no events
- Aggregations with
GROUP BY → at least one group of size 1 (often masks fan-out bugs) and one group whose key is NULL
- Incremental merge keys → both an "insert" row and an "update" row matching an existing key
If you can't think of a failure mode for a construct, you don't yet understand it well enough to test it — read the SQL again before guessing inputs.
Core Workflow: Analyze -> Generate -> Refine -> Validate -> Write
Phase 1: Analyze the Model
Before generating any tests, deeply understand the model:
altimate-dbt compile --model <name>
read <model_sql_file>
dbt_unit_test_gen(manifest_path: "target/manifest.json", model: "<name>")
What to look for:
- Which upstream refs/sources does this model depend on?
- What SQL constructs need testing? (CASE/WHEN, JOINs, window functions, aggregations)
- What edge cases exist? (NULLs, empty strings, zero values, boundary dates)
- Is this an incremental model? (needs
is_incremental override tests)
- Are any upstream models ephemeral? (need sql format)
Phase 2: Generate Tests
The dbt_unit_test_gen tool does the heavy lifting:
dbt_unit_test_gen(
manifest_path: "target/manifest.json",
model: "fct_orders",
max_scenarios: 5
)
This returns:
- Complete YAML with mock inputs and expected outputs
- Semantic context: model/column descriptions, column lineage, compiled SQL
- List of anti-patterns that informed edge case generation
- Warnings about ephemeral deps, missing columns, etc.
If the tool reports missing columns (placeholder rows in the YAML), discover them:
altimate-dbt columns --model <upstream_model_name>
altimate-dbt columns-source --source <source_name> --table <table_name>
Then update the generated YAML with real column names.
Phase 3: Refine Expected Outputs
This is the critical step that differentiates good tests from bad ones.
The tool generates placeholder expected outputs based on column types. You MUST refine them:
Option A: Compute by running SQL (preferred)
altimate-dbt test --model <name>
Option B: Manual computation
Read the model SQL carefully and mentally execute it against the mock inputs.
For each test case:
- Look at the mock input rows
- Trace through the SQL logic (CASE/WHEN branches, JOINs, aggregations)
- Write the correct expected output
Option C: Use the warehouse (most accurate)
altimate-dbt execute --query "WITH mock_stg_orders AS (SELECT 1 AS order_id, 100.00 AS amount) SELECT * FROM (<model_sql>) sub"
Phase 4: Validate
altimate-dbt test --model <name>
Phase 5: Write to File
Place unit tests in one of these locations (match project convention):
models/<layer>/_unit_tests.yml (dedicated file)
models/<layer>/schema.yml (append to existing)
glob models/**/*unit_test*.yml models/**/*schema*.yml
edit <yaml_file>
write <yaml_file>
Test Case Categories
Happy Path (always generate)
Standard inputs that exercise the main logic path. 2 rows minimum.
NULL Handling
Set nullable columns to NULL in the last row. Verify COALESCE/NVL/IFNULL behavior.
Boundary Values
Zero amounts, empty strings, epoch dates, MAX values. Tests robustness.
Edge Cases
- Division by zero (if model divides)
- Non-matching JOINs (LEFT JOIN with no match)
- Single-row aggregation
- Duplicate key handling
Incremental
For incremental models only. Use overrides.macros.is_incremental: true to test the incremental path.
Common Mistakes
| Mistake | Fix |
|---|
| Missing a ref() in given | Parse manifest for ALL depends_on nodes |
| Wrong column names in mock data | Use manifest columns, not guesses |
| Wrong data types | Use schema catalog types |
| Expected output is just mock input | Actually compute the transformation |
| Dict format for ephemeral model | Use format: sql with raw SQL |
| Not testing NULL path in COALESCE | Add null_handling test case |
| Hardcoded dates with current_timestamp | Use overrides.macros to mock timestamps |
| Testing trivial pass-through | Skip models with no logic |
YAML Format Reference
unit_tests:
- name: test_<model>_<scenario>
description: "What this test verifies"
model: <model_name>
overrides:
macros:
is_incremental: true
vars:
run_date: "2024-01-15"
given:
- input: ref('upstream_model')
rows:
- { col1: value1, col2: value2 }
- input: source('source_name', 'table_name')
rows:
- { col1: value1 }
- input: ref('ephemeral_model')
format: sql
rows: |
SELECT 1 AS id, 'test' AS name
UNION ALL
SELECT 2 AS id, 'other' AS name
expect:
rows:
- { output_col1: expected1, output_col2: expected2 }
Reference Guides