원클릭으로 Manus에서 모든 스킬 실행

$pwd:

dbt-workflow

Name: Dbt Workflow
Author: SignalPilot-Labs

// Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML.

Manus에서 실행

$ git log --oneline --stat

stars:462

forks:22

updated:2026년 5월 16일 00:04

SKILL.md

readonly

name	dbt-workflow
description	Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML.
type	skill

dbt Workflow Skill — Explore and Plan

0. Notion Context — Gather Business Context BEFORE Planning

Check if .claude/notion-config.md exists in the working directory.

If config exists: Load the notion-context skill and run it. The skill searches configured Notion pages (meeting notes, product specs, data dictionaries) for context relevant to the current task. It returns a structured NOTION CONTEXT block with definitions, decisions, and constraints.

After gathering context:

Write the full NOTION CONTEXT block to notion_context.md in the working directory (the notion-verify subagent reads this after the build).
Keep the context in your working memory — reference it when making decisions about grain, joins, filters, and column logic in later steps.
If Notion context conflicts with YML, prefer YML for column names but prefer Notion for business logic (grain, filter rules, metric definitions).

If config does not exist: Skip this step — Notion is optional. Proceed to Section 1. If the user asks about Notion context later, point them to /notion-setup.

1. Output Shape — Read YML Description BEFORE Writing SQL

Extract from description: field:

ENTITY: "for each customer/driver/order" → one row per qualifying entity
QUALIFIER: "due to returned items" / "with at least one order" → filter or INNER JOIN
RANK CONSTRAINT: "top N" / "ranks the top N" → exactly N output rows. Filter with ROW_NUMBER() ... <= N using a deterministic tiebreaker (add primary key to ORDER BY). Do NOT use DENSE_RANK for filtering — it can return more than N rows.
TEMPORAL SCOPE: "rolling window", "MoM", "WoW", or "month-over-month" in the description → ONE output date (latest), not all historical dates. Filter with WHERE date_col = (SELECT MAX(date_col) FROM source).
PERIOD-OVER-PERIOD: If the description mentions MoM, WoW, YoY comparisons AND you are writing this model from scratch (stub/missing), the comparison column must be CAST(NULL AS DOUBLE) — see rule below.

How to read YML descriptions: Descriptions tell you what the data MEANS, not what code to write. Use them to:

Identify which source columns to use (e.g. "starting from first position on the grid" → use the grid column, not qualifying position)
Understand the business meaning of each column
Pick the right aggregation logic

But do NOT treat descriptions as literal computation instructions. They may describe steady-state behavior that doesn't apply on first build, or use imprecise language. After reading the description, always verify your logic against the actual source data — query the source tables to confirm which columns and values produce the expected result.

Write at top of SQL: -- EXPECTED SHAPE: <row count or formula> — REASON: <quote>

1b. Snapshot Reference Tables BEFORE Building

The starting database contains pre-computed reference tables with correct output. dbt run will overwrite them. Before your first dbt run, for each target model that already exists as a table in the database:

SELECT COUNT(*) FROM <model_name>

Record the row count in your -- EXPECTED SHAPE comment. If your rebuilt model's row count doesn't match after dbt run, you MUST diff against this reference to find which rows differ.

2. Incremental Models and Period-Over-Period Columns

When a dbt project uses materialized="incremental" models, the project is designed to accumulate state over multiple runs. On a first run (full refresh, no prior state), incremental models build from scratch.

If you are writing a new model that includes period-over-period metrics (MoM, WoW, YoY) and the project has not been run incrementally before:

Output rows for the latest date only: WHERE date_col = (SELECT MAX(date_col) FROM source)
Period-over-period columns must be CAST(NULL AS DOUBLE) — there is no prior aggregated state to compare against. Computing these from raw historical data would produce values that don't match the expected first-run output.

If the model SQL already exists (not a stub):

Read the {% if is_incremental() %} block to understand the filter logic.
The code outside that block runs on full refresh.

3. What to Trust in YML

Trust YML for: column names (exact match required), column descriptions (what each column represents), ref dependencies (what tables to join).

YML not_null tests on key/dimension columns (IDs, names, dates, categories) imply a WHERE col IS NOT NULL filter on input data. Do NOT implement this as an INNER JOIN — use an explicit WHERE clause. not_null on metric/aggregate columns (counts, averages, totals) just asserts the output shouldn't be NULL — don't filter inputs for those, fix the aggregation instead.

Do NOT trust YML for: grain/row count. YML unique and not_null tests are assertions that may be aspirational or wrong. Do NOT use not_null tests to decide join type.

Derive the grain from these signals (in priority order):

Unique key structure: If the YML defines a unique key or surrogate key column, examine what it's composed of. A key like concat(ticker, timestamp) means the grain is (ticker, timestamp) — not (ticker, date). The key tells you exactly what combination of values identifies one row.
Column list: The columns themselves reveal the grain. If a model has both a header-level key AND a detail-level key as separate columns, the grain is at the detail level.
Upstream model grain: Check existing upstream models that feed into yours. If bar_executions produces one row per (ticker, timestamp), your model that depends on it likely has the same or coarser grain — not finer.
Source cardinality: Before writing SQL, query the source tables to check how many rows your model should produce: SELECT COUNT(DISTINCT key_col) FROM source_table If your model produces dramatically fewer rows than upstream, your GROUP BY is too coarse.
Sibling model row counts: Check complete models at the same level.

Do NOT deduplicate with ROW_NUMBER to force a unique test to pass — if the data naturally has multiple rows per key, keep them all.

related-skills.json

같은 저장소

notion-context.md

from "SignalPilot-Labs/SignalPilot"

Gather business context from Notion before dbt builds. Searches pages, extracts definitions/decisions/constraints, writes structured context for the build agent and notion-verify subagent.

2026-05-19462

dbt-debugging.md

from "SignalPilot-Labs/SignalPilot"

Load when dbt run or dbt parse fails. Covers YML duplicate patches, ref errors, passthrough model warnings, current_date fixes, DuckDB error messages, and zero-row diagnosis.

2026-04-20462

dbt-write.md

from "SignalPilot-Labs/SignalPilot"

Load at Step 4 when writing SQL models. Covers column naming, type preservation, JOIN defaults, lookup joins, sibling models, materialization, packages, and filtering rules.

2026-04-20462

duckdb-sql.md

from "SignalPilot-Labs/SignalPilot"

Load when hitting DuckDB syntax errors or writing DuckDB-specific SQL. Covers gotchas that differ from PostgreSQL/MySQL.

2026-04-20462

bigquery-sql.md

from "SignalPilot-Labs/SignalPilot"

BigQuery-specific SQL patterns: UNNEST for array expansion, STRUCT, ARRAY_AGG, DATE_DIFF/DATE_ADD, backtick-quoted table references, EXCEPT/REPLACE in SELECT, approximate aggregation, partitioned and wildcard tables.

2026-04-14462

sql-workflow.md

from "SignalPilot-Labs/SignalPilot"

Use this skill before writing any SQL query. Covers: output shape inference (cardinality clues from the question), efficient schema exploration, iterative CTE-based query building, structured verification loop (row count, NULL audit, fan-out check, sample inspection), error recovery protocol, saving output to result.sql and result.csv, turn budget management, and common benchmark traps.

2026-04-14462

package.json

"author": "SignalPilot-Labs"

"repository": "SignalPilot-Labs/SignalPilot"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name	dbt-workflow
description	Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML.
type	skill

dbt Workflow Skill — Explore and Plan

0. Notion Context — Gather Business Context BEFORE Planning

Check if .claude/notion-config.md exists in the working directory.

After gathering context:

Write the full NOTION CONTEXT block to notion_context.md in the working directory (the notion-verify subagent reads this after the build).
Keep the context in your working memory — reference it when making decisions about grain, joins, filters, and column logic in later steps.
If Notion context conflicts with YML, prefer YML for column names but prefer Notion for business logic (grain, filter rules, metric definitions).

If config does not exist: Skip this step — Notion is optional. Proceed to Section 1. If the user asks about Notion context later, point them to /notion-setup.

1. Output Shape — Read YML Description BEFORE Writing SQL

Extract from description: field:

ENTITY: "for each customer/driver/order" → one row per qualifying entity
QUALIFIER: "due to returned items" / "with at least one order" → filter or INNER JOIN
RANK CONSTRAINT: "top N" / "ranks the top N" → exactly N output rows. Filter with ROW_NUMBER() ... <= N using a deterministic tiebreaker (add primary key to ORDER BY). Do NOT use DENSE_RANK for filtering — it can return more than N rows.
TEMPORAL SCOPE: "rolling window", "MoM", "WoW", or "month-over-month" in the description → ONE output date (latest), not all historical dates. Filter with WHERE date_col = (SELECT MAX(date_col) FROM source).
PERIOD-OVER-PERIOD: If the description mentions MoM, WoW, YoY comparisons AND you are writing this model from scratch (stub/missing), the comparison column must be CAST(NULL AS DOUBLE) — see rule below.

How to read YML descriptions: Descriptions tell you what the data MEANS, not what code to write. Use them to:

Identify which source columns to use (e.g. "starting from first position on the grid" → use the grid column, not qualifying position)
Understand the business meaning of each column
Pick the right aggregation logic

Write at top of SQL: -- EXPECTED SHAPE: <row count or formula> — REASON: <quote>

1b. Snapshot Reference Tables BEFORE Building

SELECT COUNT(*) FROM <model_name>

Record the row count in your -- EXPECTED SHAPE comment. If your rebuilt model's row count doesn't match after dbt run, you MUST diff against this reference to find which rows differ.

2. Incremental Models and Period-Over-Period Columns

If you are writing a new model that includes period-over-period metrics (MoM, WoW, YoY) and the project has not been run incrementally before:

Output rows for the latest date only: WHERE date_col = (SELECT MAX(date_col) FROM source)
Period-over-period columns must be CAST(NULL AS DOUBLE) — there is no prior aggregated state to compare against. Computing these from raw historical data would produce values that don't match the expected first-run output.

If the model SQL already exists (not a stub):

Read the {% if is_incremental() %} block to understand the filter logic.
The code outside that block runs on full refresh.

3. What to Trust in YML

Trust YML for: column names (exact match required), column descriptions (what each column represents), ref dependencies (what tables to join).

Do NOT trust YML for: grain/row count. YML unique and not_null tests are assertions that may be aspirational or wrong. Do NOT use not_null tests to decide join type.

Derive the grain from these signals (in priority order):

Unique key structure: If the YML defines a unique key or surrogate key column, examine what it's composed of. A key like concat(ticker, timestamp) means the grain is (ticker, timestamp) — not (ticker, date). The key tells you exactly what combination of values identifies one row.
Column list: The columns themselves reveal the grain. If a model has both a header-level key AND a detail-level key as separate columns, the grain is at the detail level.
Upstream model grain: Check existing upstream models that feed into yours. If bar_executions produces one row per (ticker, timestamp), your model that depends on it likely has the same or coarser grain — not finer.
Source cardinality: Before writing SQL, query the source tables to check how many rows your model should produce: SELECT COUNT(DISTINCT key_col) FROM source_table If your model produces dramatically fewer rows than upstream, your GROUP BY is too coarse.
Sibling model row counts: Check complete models at the same level.

Do NOT deduplicate with ROW_NUMBER to force a unique test to pass — if the data naturally has multiple rows per key, keep them all.

dbt-workflow

dbt Workflow Skill — Explore and Plan

0. Notion Context — Gather Business Context BEFORE Planning

1. Output Shape — Read YML Description BEFORE Writing SQL

1b. Snapshot Reference Tables BEFORE Building

2. Incremental Models and Period-Over-Period Columns

3. What to Trust in YML

이 저장소의 다른 Skills

dbt Workflow Skill — Explore and Plan

0. Notion Context — Gather Business Context BEFORE Planning

1. Output Shape — Read YML Description BEFORE Writing SQL

1b. Snapshot Reference Tables BEFORE Building

2. Incremental Models and Period-Over-Period Columns

3. What to Trust in YML

이 저장소의 다른 Skills