with one click
bigquery-sql
// BigQuery-specific SQL patterns: UNNEST for array expansion, STRUCT, ARRAY_AGG, DATE_DIFF/DATE_ADD, backtick-quoted table references, EXCEPT/REPLACE in SELECT, approximate aggregation, partitioned and wildcard tables.
// BigQuery-specific SQL patterns: UNNEST for array expansion, STRUCT, ARRAY_AGG, DATE_DIFF/DATE_ADD, backtick-quoted table references, EXCEPT/REPLACE in SELECT, approximate aggregation, partitioned and wildcard tables.
Gather business context from Notion before dbt builds. Searches pages, extracts definitions/decisions/constraints, writes structured context for the build agent and notion-verify subagent.
Load at Step 1 before exploring the project. Covers Notion context gathering, output shape inference, incremental model handling, and what to trust in YML.
Load when dbt run or dbt parse fails. Covers YML duplicate patches, ref errors, passthrough model warnings, current_date fixes, DuckDB error messages, and zero-row diagnosis.
Load at Step 4 when writing SQL models. Covers column naming, type preservation, JOIN defaults, lookup joins, sibling models, materialization, packages, and filtering rules.
Load when hitting DuckDB syntax errors or writing DuckDB-specific SQL. Covers gotchas that differ from PostgreSQL/MySQL.
Use this skill before writing any SQL query. Covers: output shape inference (cardinality clues from the question), efficient schema exploration, iterative CTE-based query building, structured verification loop (row count, NULL audit, fan-out check, sample inspection), error recovery protocol, saving output to result.sql and result.csv, turn budget management, and common benchmark traps.
| name | bigquery-sql |
| description | BigQuery-specific SQL patterns: UNNEST for array expansion, STRUCT, ARRAY_AGG, DATE_DIFF/DATE_ADD, backtick-quoted table references, EXCEPT/REPLACE in SELECT, approximate aggregation, partitioned and wildcard tables. |
| type | skill |
-- Full table reference
SELECT * FROM `project.dataset.table`;
-- Can omit project if using the default project
SELECT * FROM `dataset.table`;
-- Explode an array column to rows
SELECT id, item
FROM `project.dataset.table`,
UNNEST(array_col) AS item;
-- UNNEST with offset (position)
SELECT id, item, pos
FROM `project.dataset.table`,
UNNEST(array_col) AS item WITH OFFSET AS pos;
-- UNNEST a literal array
SELECT * FROM UNNEST([1, 2, 3]) AS num;
-- Add/subtract time
DATE_ADD(order_date, INTERVAL 7 DAY)
DATE_ADD(CURRENT_DATE(), INTERVAL -1 MONTH)
-- Difference between dates
DATE_DIFF(end_date, start_date, DAY)
DATE_DIFF(end_date, start_date, MONTH)
-- Truncate to period
DATE_TRUNC(event_date, MONTH)
TIMESTAMP_TRUNC(event_ts, HOUR)
-- Current date/time
CURRENT_DATE()
CURRENT_TIMESTAMP()
-- All columns except one
SELECT * EXCEPT (col_to_remove) FROM `dataset.table`;
-- Replace a column value inline
SELECT * REPLACE (UPPER(name) AS name) FROM `dataset.table`;
-- Create a STRUCT
SELECT STRUCT(id, name) AS person FROM `dataset.table`;
-- Aggregate rows into an array
SELECT department, ARRAY_AGG(employee_name) AS employees
FROM `dataset.employees`
GROUP BY department;
-- Aggregate into array of structs
SELECT ARRAY_AGG(STRUCT(id, name)) AS records FROM `dataset.table`;
-- Approximate distinct count (faster for large tables)
APPROX_COUNT_DISTINCT(user_id)
-- Approximate quantiles
APPROX_QUANTILES(value, 100)[OFFSET(50)] -- median
When querying partitioned tables, always filter on the partition column to avoid full-table scans:
-- Partition on _PARTITIONDATE (pseudo-column)
WHERE _PARTITIONDATE >= '2024-01-01'
-- Partition on a date column
WHERE event_date BETWEEN '2024-01-01' AND '2024-12-31'
-- Query all date-sharded tables matching a prefix
SELECT * FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20241231';
REGEXP_EXTRACT(col, r'pattern') -- extract first match
REGEXP_REPLACE(col, r'pattern', 'repl') -- replace matches
SPLIT(col, ',')[SAFE_OFFSET(0)] -- split, access by index
TRIM(col) / LTRIM(col) / RTRIM(col)
FORMAT('%s-%d', str_col, int_col) -- printf-style formatting
= NULL — use IS NULLCOUNT(DISTINCT ...) on huge tables — use APPROX_COUNT_DISTINCTSTRING_AGG(col, ',' ORDER BY col) for string aggregation (not GROUP_CONCAT).IF(condition, true_val, false_val) — often cleaner than CASE WHEN for simple conditions.SELECT * FROM dataset.INFORMATION_SCHEMA.COLUMNS for metadata queries — useful when schema_overview is insufficient.spider2-public-data. Table references: spider2-public-data.{dataset}.{table}tags column (e.g., |python|python-2.7|).
To filter for Python 2 specific questions (excluding Python 3):
WHERE REGEXP_CONTAINS(tags, r'python-2') AND NOT REGEXP_CONTAINS(tags, r'python-3')