| name | metric-views-patterns |
| description | Standard patterns for creating Databricks Metric Views with semantic metadata for Genie and AI/BI. Use when creating metric views, troubleshooting metric view creation errors, validating schema references before deployment, implementing joins (including snowflake schema patterns), or optimizing metric views for Genie natural language queries. |
| metadata | {"author":"prashanth subrahmanyam","version":"2.0","domain":"semantic-layer","role":"worker","pipeline_stage":6,"pipeline_stage_name":"semantic-layer","called_by":["semantic-layer-setup"],"standalone":true,"last_verified":"2026-04-27","volatility":"high","upstream_sources":[{"name":"ai-dev-kit","repo":"databricks-solutions/ai-dev-kit","paths":["databricks-skills/databricks-metric-views/SKILL.md"],"relationship":"extended","last_synced":"2026-04-27","sync_commit":"latest"},{"name":"databricks-docs-overview","url":"https://docs.databricks.com/aws/en/business-semantics/metric-views/","relationship":"upstream","last_synced":"2026-04-27"},{"name":"databricks-docs-yaml-reference","url":"https://docs.databricks.com/aws/en/business-semantics/metric-views/yaml-reference","relationship":"upstream","last_synced":"2026-04-27"},{"name":"databricks-docs-basic-modeling","url":"https://docs.databricks.com/aws/en/business-semantics/metric-views/basic-modeling","relationship":"upstream","last_synced":"2026-04-27"},{"name":"databricks-docs-advanced-techniques","url":"https://docs.databricks.com/aws/en/business-semantics/metric-views/advanced-techniques","relationship":"upstream","last_synced":"2026-04-27"},{"name":"databricks-docs-create-sql","url":"https://docs.databricks.com/aws/en/metric-views/create/sql","relationship":"upstream","last_synced":"2026-04-27"},{"name":"databricks-docs-manage","url":"https://docs.databricks.com/aws/en/business-semantics/metric-views/manage","relationship":"upstream","last_synced":"2026-04-27"},{"name":"databricks-docs-agent-metadata","url":"https://docs.databricks.com/aws/en/business-semantics/agent-metadata","relationship":"upstream","last_synced":"2026-04-27"}]} |
End-to-end semantic layer? If you are creating Metric Views as part of a larger deployment that also includes TVFs and Genie Spaces, read semantic-layer/00-semantic-layer-setup/SKILL.md first — it orchestrates this skill with the others and mandates Gold schema validation before artifact creation.
Metric Views Patterns for Genie & AI/BI
Overview
Metric Views provide a semantic layer for natural language queries via Genie and AI/BI dashboards. This skill standardizes the YAML structure for comprehensive, LLM-friendly metric definitions following Databricks Metric View Specification v1.1.
Predecessor: Gold tables must exist before creating metric views. Use gold-layer-design + gold-layer-setup skills first.
Key Capabilities:
- Create metric views with proper SQL syntax (
WITH METRICS LANGUAGE YAML)
- Validate schemas before deployment to prevent 100% of common errors
- Structure joins (direct and snowflake schema patterns)
- Optimize comments for Genie natural language queries
- Handle SCD2 dimensions with proper
is_current filtering
When to Use This Skill
Use this skill when:
- Creating new metric views for Genie Spaces
- Troubleshooting metric view creation errors
- Validating schema references before deployment
- Implementing joins (including transitive relationships)
- Optimizing metric views for Genie natural language queries
- Ensuring compliance with v1.1 specification
- Following the requirements gathering template to design metric views
Prerequisites
⚠️ MANDATORY: Complete these before creating metric views:
Layer-aware deployment (workshop mode): The patterns in this skill (YAML, dimensions, measures, joins) are layer-neutral. The orchestrator (semantic-layer/00-semantic-layer-setup) decides which schema to deploy against based on planning_source.selected_layer:
deployed_gold / gold_design → reference Gold tables (production path).
deployed_silver / deployed_bronze (workshop deployments) → reference Silver or Bronze tables directly. Metric View YAML is identical in shape; the source field points at the workshop layer's schema. The orchestrator prints a quality advisory because raw layers typically lack curated COMMENTs and dimensional joins.
source_csv → not reached; the orchestrator stops because no live tables exist.
Production Metric Views always reference Gold; workshop builds may reference Silver/Bronze and should be promoted to Gold for production hardening.
MCP Tools (from upstream databricks-metric-views)
The manage_metric_views MCP tool supports all metric view operations:
| Action | Description |
|---|
create | Create a metric view with dimensions and measures |
alter | Update a metric view's YAML definition |
describe | Get the full definition and metadata |
query | Query measures grouped by dimensions |
drop | Drop a metric view |
grant | Grant SELECT privileges to users/groups |
Quick Start (2 hours)
What You'll Create:
metric_views/{view_name}.yaml — Semantic definitions (dimensions, measures, joins, formats)
create_metric_views.py — Script reads YAML, creates views with WITH METRICS LANGUAGE YAML
metric_views_job.yml — Asset Bundle job for deployment
Deploy: databricks bundle deploy -t dev && databricks bundle run metric_views_job -t dev
Critical Rules
⚠️ CRITICAL: Correct SQL Syntax
Metric views MUST be created using WITH METRICS LANGUAGE YAML syntax:
create_sql = f"""
CREATE OR REPLACE VIEW {fully_qualified_name}
WITH METRICS
LANGUAGE YAML
COMMENT '{view_comment_escaped}'
AS $$
{yaml_str}
$$
"""
Key Requirements:
WITH METRICS — Identifies the view as a metric view
LANGUAGE YAML — Specifies YAML format
AS $$ ... $$ — YAML content wrapped in dollar-quote delimiters
- No SELECT statement — The YAML definition IS the view definition
version field — Must be included in each metric view YAML
❌ WRONG: Regular view with TBLPROPERTIES (creates regular VIEW, not METRIC_VIEW)
⚠️ CRITICAL: v1.1 Unsupported Fields
These fields will cause errors and MUST NOT be used:
| Field | Error | Action |
|---|
name | Unrecognized field "name" | ❌ NEVER include — name is in CREATE VIEW statement |
time_dimension | Unrecognized field "time_dimension" | ❌ Remove entirely |
window_measures | Unrecognized field "window_measures" | ❌ Remove top-level window_measures: array. Individual measure window: property is Experimental (v0.1 only, DBR 16.4-17.1). See references/composability-patterns.md for details. |
join_type | Unsupported | ❌ Remove — defaults to LEFT OUTER JOIN |
table (in joins) | Missing required creator property 'source' | ✅ Use source instead |
⚠️ MANDATORY: Pre-Creation Schema Validation
ALWAYS validate schemas BEFORE creating metric view YAML. 100% of deployment failures are preventable schema issues.
Schema Validation Checklist:
See references/validation-checklist.md for detailed validation steps.
⚠️ CRITICAL: Source Table Selection
Rule: Revenue/bookings/transactions → FACT table. Property/host counts → DIMENSION table.
❌ WRONG: Revenue from dimension table (under-reports by 4x)
source: ${catalog}.${schema}.dim_property
✅ CORRECT: Revenue from fact table
source: ${catalog}.${schema}.fact_booking_daily
⚠️ CRITICAL: Transitive Join Limitations
Metric Views DO NOT support transitive/chained joins (where join B's on clause references join A instead of source).
How to detect: If ANY join's on clause references a join alias (not source), it is transitive and will fail.
❌ WRONG: Transitive join (join B references join A)
joins:
- name: dim_property
source: catalog.schema.dim_property
'on': source.property_id = dim_property.property_id
- name: dim_destination
source: catalog.schema.dim_destination
'on': dim_property.destination_id = dim_destination.destination_id
This fails at plan-time with UNRESOLVED_COLUMN because dim_property is not visible in dim_destination's on scope.
✅ FIX 1 (Preferred — simplest): Use denormalized columns from existing dimension
If dim_property already has destination_name and destination_country, reference them directly — no second join needed:
dimensions:
- name: destination_name
expr: dim_property.destination_name
- name: destination_country
expr: dim_property.destination_country
✅ FIX 2: Snowflake schema (nested joins) — requires DBR 17.1+
joins:
- name: dim_property
source: catalog.schema.dim_property
'on': source.property_id = dim_property.property_id
joins:
- name: dim_destination
source: catalog.schema.dim_destination
'on': dim_property.destination_id = dim_destination.destination_id
Validation gate: Before generating YAML, inspect all join on clauses. If the left side of any on references a join name (not source), restructure as nested joins or use denormalized columns.
Pre-check for Fix 2 (nested joins): Verify the workspace runtime supports nested joins:
dbr = spark.sql("SELECT current_version()").first()[0]
assert float('.'.join(dbr.split('.')[:2])) >= 17.1, \
f"Nested joins require DBR 17.1+, got {dbr}. Use Fix 1 or restructure."
If Fix 1 is not feasible (the intermediate dimension lacks the needed column, e.g., dim_property does not have destination_name), do NOT silently use Fix 2. Flag the constraint to the user and offer:
- (a) Add the column to the intermediate dimension in the Gold layer design
- (b) Confirm DBR 17.1+ and use nested joins
- (c) Omit the dimension from the Metric View and handle it via TVFs instead
See references/advanced-patterns.md for additional snowflake schema examples.
⚠️ CRITICAL: Multi-Hop / Snowflake Joins (subquery-source pattern)
Transitive joins aren't the only multi-hop trap. Two other shapes silently produce wrong numbers instead of a clean planner error — and both are indistinguishable from valid Metric Views on casual inspection. Treat this section as the canonical reference for any join that must traverse more than one dimension hop.
Failure mode A (silent row-drop — DBR < 17.1): Nested joins emitted under joins: are ignored by older runtimes. Aggregations run against the un-joined base, so dimensions from the nested table quietly disappear from the GROUP BY. No error surfaces — the Metric View just returns the wrong grain.
Failure mode B (fan-out cartesian): Nested joins work on DBR 17.1+ but the join key on the intermediate table (dim_property.destination_id) is not unique. Every row from source fans out over every matching dim_destination row, inflating measures. Again: no error — just a wrong total.
❌ Anti-pattern 1 — flat "sibling" join relying on an earlier join's alias
joins:
- name: dim_property
source: catalog.schema.dim_property
'on': source.property_id = dim_property.property_id
- name: dim_destination
source: catalog.schema.dim_destination
'on': dim_property.destination_id = dim_destination.destination_id
Planner behaviour:
- DBR ≥ 17.1: raises
UNRESOLVED_COLUMN at creation time.
- DBR < 17.1: may silently accept and produce wrong results. Never rely on the error.
❌ Anti-pattern 2 — nested join without verifying intermediate uniqueness
joins:
- name: dim_property
source: catalog.schema.dim_property
'on': source.property_id = dim_property.property_id
joins:
- name: dim_destination
source: catalog.schema.dim_destination
'on': dim_property.destination_id = dim_destination.destination_id
Always verify uniqueness before shipping a nested join:
SELECT property_id, COUNT(*) AS c
FROM catalog.schema.dim_property
GROUP BY property_id HAVING c > 1;
SELECT destination_id, COUNT(*) AS c
FROM catalog.schema.dim_destination
WHERE is_current = true
GROUP BY destination_id HAVING c > 1;
✅ Correct pattern — subquery source (pre-join, then treat as one dim)
When nested joins are not available (DBR < 17.1) OR the intermediate table is not uniquely keyed, fold the multi-hop into a subquery source that pre-resolves the join off-Metric-View. The Metric View then joins source to a single, clean, uniquely-keyed dimension.
source: catalog.schema.fact_booking_daily
joins:
- name: dim_property_enriched
source: |
(
SELECT
p.property_key,
p.property_id,
p.property_name,
d.destination_id,
d.destination_name,
d.country
FROM catalog.schema.dim_property p
LEFT JOIN catalog.schema.dim_destination d
ON p.destination_id = d.destination_id
AND d.is_current = true -- SCD2 guard
WHERE p.is_current = true
)
'on': source.property_key = dim_property_enriched.property_key
dimensions:
- name: property_name
expr: dim_property_enriched.property_name
- name: destination_name
expr: dim_property_enriched.destination_name
- name: country
expr: dim_property_enriched.country
Why this is the recommended default:
- Works on every DBR version (no 17.1 dependency).
- The subquery makes uniqueness guarantees explicit — reviewers can see and test them.
- Re-usable: wrap the subquery in a Gold-layer VIEW (
dim_property_enriched_v) and reference it in every Metric View that needs the same enrichment — this also anchors the Gold dependency manifest (see planning/00-project-planning/SKILL.md).
Decision ladder for any multi-hop requirement:
- Can the intermediate dimension carry the needed attribute directly (denormalize)? → Use Fix 1 (flat join, no second hop).
- Is every intermediate key uniquely 1:1 and is DBR ≥ 17.1? → Use nested joins (Fix 2) and document the uniqueness check in a comment.
- Otherwise → use the subquery-source pattern above. Do NOT ship a transitive join and hope for an error.
Anti-pattern detector — run before deploy:
import yaml, re
bad = []
for yf in Path("src/semantic/metric_views").rglob("*.yaml"):
mv = yaml.safe_load(yf.read_text())
joins = (mv.get("joins") or []) if isinstance(mv, dict) else []
alias_names = {j["name"] for j in joins}
for j in joins:
on_clause = j.get("on", "")
m = re.match(r"\s*([A-Za-z_][\w.]*)", on_clause)
left_head = (m.group(1).split(".", 1)[0] if m else "")
if left_head in alias_names:
bad.append((str(yf), j["name"], on_clause))
if bad:
for f, n, o in bad:
print(f"TRANSITIVE JOIN in {f} :: join '{n}' -> {o}")
raise RuntimeError(
"Transitive/flat-sibling joins detected. Restructure as nested "
"joins (DBR 17.1+) or as subquery-source (preferred)."
)
Implementation Workflow
Phase 1: Design (30 min)
Read: references/requirements-template.md
Phase 2: YAML Creation (1 hour)
Read: references/yaml-reference.md and references/advanced-patterns.md
Phase 3: Script & Bundle (30 min)
Read: references/implementation-workflow.md
Phase 4: Deploy & Test (30 min)
Read: references/validation-queries.md
Quick Reference
YAML Structure (v1.1)
version: "1.1"
comment: >
PURPOSE: [One-line description]
BEST FOR: [Question 1] | [Question 2] | [Question 3]
NOT FOR: [What to avoid] (use [correct_asset] instead)
DIMENSIONS: [dim1], [dim2], [dim3]
MEASURES: [measure1], [measure2], [measure3]
SOURCE: [fact_table] ([domain] domain)
JOINS: [dim_table1] ([description])
NOTE: [Critical caveats]
source: ${catalog}.${gold_schema}.<fact_table>
filter: <sql_boolean_expression>
joins:
- name: <dim_table_alias>
source: ${catalog}.${gold_schema}.<dim_table>
'on': source.<fk> = <dim_table_alias>.<pk> AND <dim_table_alias>.is_current = true
dimensions:
- name: <dimension_name>
expr: source.<column>
comment: <Business description>
display_name: <User-Friendly Name>
synonyms: [<alt1>, <alt2>]
measures:
- name: <measure_name>
expr: SUM(source.<column>)
comment: <Business description>
display_name: <User-Friendly Name>
format:
type: currency|number|percentage
currency_code: USD
decimal_places:
type: exact|all
places: 2
synonyms: [<alt1>, <alt2>]
Valid format types (exhaustive):
| Type | Use For | Common Mistake |
|---|
byte | Data sizes (storage, memory) | — |
currency | Monetary values (revenue, cost) | — |
date | Date-only values | — |
date_time | Timestamp values | — |
number | Counts, averages, decimals, integers | ❌ decimal, ❌ integer |
percentage | Ratios, rates, percentages | ❌ percent |
⚠️ percent is NOT valid (use percentage). decimal is NOT valid (use number).
Column References
- Main table columns: Use
source. prefix in all expr fields
- Joined table columns: Use join
name as prefix (e.g., dim_store.column_name)
- Never reference table names directly: Use
source. or {join_name}.
Join Requirements
- Each join MUST have
name, source, and either 'on' or using
ON clause: boolean expression using source. for main table, join name for joined table (quote the key: 'on')
USING clause: array of column names shared between source and join table
- Each first-level join must reference
source (NOT another join alias — that's transitive)
- For transitive relationships, use nested
joins: (snowflake schema, DBR 17.1+) or denormalized columns
- SCD2 joins must include
AND {dim_table}.is_current = true
- MAP type columns are NOT supported in joined tables
Core Patterns
Composability (MEASURE Function)
Metric views support composability — building complex metrics by referencing simpler measures via the MEASURE() function. Define atomic measures first, then compose derived KPIs:
measures:
- name: total_revenue
expr: SUM(source.net_revenue)
- name: order_count
expr: COUNT(source.order_id)
- name: avg_order_value
expr: MEASURE(total_revenue) / MEASURE(order_count)
Measure-level filtering with FILTER clause:
- name: fulfilled_orders
expr: COUNT(1) FILTER (WHERE source.order_status = 'F')
- name: fulfillment_rate
expr: MEASURE(fulfilled_orders) / MEASURE(order_count)
format:
type: percentage
Best practices: Define atomic measures (SUM, COUNT, AVG) first; always use MEASURE() to reference other measures (never repeat the aggregation logic).
See references/composability-patterns.md for full guide including conditional logic, window measures (Experimental), and complete examples.
Standardized Comment Format (v3.0)
Use structured format for Genie optimization:
comment: >
PURPOSE: Comprehensive cost analytics for Databricks billing and usage analysis.
BEST FOR: Total spend by workspace | Cost trend over time | SKU cost breakdown
NOT FOR: Commit/contract tracking (use commit_tracking) | Real-time cost alerts
DIMENSIONS: usage_date, workspace_name, sku_name, owner, tag_team
MEASURES: total_cost, total_dbus, cost_7d, cost_30d
SOURCE: fact_usage (billing domain)
JOINS: dim_workspace (workspace details), dim_sku (SKU details)
NOTE: Cost values are list prices. Actual billed amounts may differ.
Dimension & Measure Patterns
See references/advanced-patterns.md for complete dimension patterns (geographic, product, time), measure patterns (revenue, count, percentage), and a full worked retail example.
Common Mistakes to Avoid
Top 5 mistakes (with paired wrong/correct examples): wrong syntax (TBLPROPERTIES), unsupported fields (time_dimension, window_measures), wrong column references, including name in YAML, transitive joins.
See references/advanced-patterns.md for detailed wrong/correct code examples for each mistake.
Python Script Error Handling
Key rules: strip name before yaml.dump(), drop existing VIEW/TABLE before CREATE, track failures and raise RuntimeError, verify METRIC_VIEW type via DESCRIBE EXTENDED.
See scripts/create_metric_views.py for the full working script and references/implementation-workflow.md for the detailed error handling patterns.
Time Estimates
| Metric Views | Design | YAML Creation | Deploy & Test | Total |
|---|
| 1 view | 20 min | 30 min | 20 min | ~1 hour |
| 2-3 views | 30 min | 1 hour | 30 min | ~2 hours |
| 5+ views | 1 hour | 2 hours | 30 min | ~3.5 hours |
Reference Files
references/yaml-reference.md — Complete YAML fields, syntax, format options
references/advanced-patterns.md — Dimension/measure patterns, joins, snowflake schema, worked examples
references/composability-patterns.md — MEASURE() function, FILTER clause, window measures (Experimental)
references/validation-checklist.md — Pre-creation validation steps
references/requirements-template.md — Design template for dimensions, measures, joins
references/implementation-workflow.md — Step-by-step creation workflow
references/validation-queries.md — SQL queries for deployment verification
Scripts & Assets
scripts/validate_metric_view.py — Pre-deployment column reference validation
scripts/create_metric_views.py — YAML loading, parameter substitution, METRIC_VIEW verification
assets/templates/metric-view-template.yaml — Starter YAML template
assets/templates/metric-views-job-template.yml — Asset Bundle job template
Materialization (Experimental)
Metric views support optional materialization for pre-computed aggregations. Lakeflow Spark Declarative Pipelines orchestrates materialized views, and the query optimizer automatically routes queries to the best materialized view using aggregate-aware query rewriting.
materialization:
schedule: every 6 hours
mode: relaxed
materialized_views:
- name: baseline
type: unaggregated
- name: revenue_breakdown
type: aggregated
dimensions: [category, color]
measures: [total_revenue]
Materialized view types: unaggregated (full data) and aggregated (pre-computed for specific dimension/measure combinations). Check refresh status with DESCRIBE TABLE EXTENDED.
Requires serverless compute enabled. Currently experimental — use for high-query-volume metric views where pre-computation reduces latency.
Common Issues
| Issue | Solution |
|---|
SELECT * not supported | Must explicitly list dimensions and use MEASURE() for measures |
| "Cannot resolve column" | Dimension/measure names with spaces need backtick quoting |
| JOIN at query time fails | Joins must be in the YAML definition, not in the SELECT query |
MEASURE() required | All measure references must be wrapped: MEASURE(\name`)` |
| DBR version error | Compute must be on DBR 17.3+ to create or edit metric views (current docs requirement); YAML v1.1 features need 17.2+; legacy v0.1 needs 16.4+ |
| Materialization not working | Requires serverless compute enabled; currently experimental |
External References
Official Documentation
Related Skills
databricks-table-valued-functions — TVF patterns for Genie
genie-space-patterns — Genie Space setup
databricks-aibi-dashboards — AI/BI dashboard patterns
Version History
- v5.1 (Apr 27, 2026) — Refreshed prerequisites and Common Issues for the current docs requirement (DBR 17.3+ for
CAN USE on metric view creation/edit). Migrated External Documentation links and upstream_sources URLs to the current /business-semantics/metric-views/... paths (overview, yaml-reference, basic-modeling, advanced-techniques, manage, agent-metadata).
- v5.0 (Feb 2026) — Expanded transitive joins with inline fixes; exhaustive format type table (6 types); composability (MEASURE function) patterns; FILTER clause; USING join clause; filter top-level field; window measures clarification (Experimental v0.1); materialization expansion; progressive disclosure restructure (Notes to Carry Forward + Next Step); 7 upstream_sources from official docs; new composability-patterns.md reference
- v4.0 (Feb 2026) — Merged prompt content: Quick Start, implementation workflow, requirements template, creation script, validation queries, worked examples, common mistakes with paired examples
- v3.0 (Dec 19, 2025) — Standardized structured comment format
- v2.0 (Dec 16, 2025) — Genie optimization patterns from production post-mortem
- v1.0 (Oct 2025) — Initial rule based on metric view deployment learnings
Metric Views Notes to Carry Forward
After completing metric view creation, carry these notes to the next worker:
- Metric View names and paths: List of all created MVs with YAML file paths
- Grain per view: Which fact table sources each MV
- Measure counts: Number of dimensions and measures per MV
- Validation status: Which MVs passed schema validation, any unresolved issues
- Composability notes: Any composed measures using MEASURE() that downstream workers should know about
Next Step
After metric views are deployed and validated, proceed to:
semantic-layer/02-databricks-table-valued-functions/SKILL.md — Create TVFs for Genie Spaces using the Gold tables referenced by your metric views.