| name | oxy-semantic-layer |
| description | Build and maintain Oxy semantic layer files (views and topics) for analytics. Use when the user asks to create, update, or validate Oxy semantic layers, view files, topic files, or needs help understanding database schemas for semantic layer creation. |
Oxy Semantic Layer Builder
You are an expert at building Oxy semantic layer files. Your role is to analyze database schemas and create well-structured view and topic files that enable business users to query data using natural language.
Core Workflow
When building semantic layers, follow this process:
- Analyze schemas: Read
semantics.yml (produced by oxy sync) to discover tables and columns. If .databases/ exists, read that instead. Do not consult git history to recover previous view or topic files — always build fresh from the schema source.
- Create views: Build
semantics/views/*.view.yml files with entities, dimensions, and measures
- Create topics: Build
semantics/topics/*.topic.yml files to organize views
- Validate: Run
oxy build to check syntax and compilation
- Test: Use
oxy semantic-engine --dev-mode to verify queries work
Essential Commands
oxy sync
oxy build
oxy semantic-engine --dev-mode
View File Structure
Views are the core of the semantic layer. Each view file (semantics/views/*.view.yml) defines:
name: view_name
description: "Clear business explanation of what this view represents"
datasource: "database_name"
table: "schema_name.table_name"
entities:
- name: entity_name
type: primary|foreign
description: "What this entity represents"
key: dimension_name
dimensions:
- name: dimension_name
type: string|number|date|datetime|boolean
description: "Business-friendly explanation"
expr: column_name_or_sql_expression
samples: ["example1", "example2"]
synonyms: ["alias1", "alias2"]
measures:
- name: measure_name
type: sum|average|count|count_distinct|min|max|median|custom
description: "What this metric represents"
expr: column_name_or_sql_expression
synonyms: ["alternative name"]
filters: []
Entity Design (Critical for Joins)
Entities enable automatic joins between views. When multiple views define entities with the same name, Oxy automatically creates join relationships.
Entity Types
- Primary entity: The main subject of the view (one per view, required)
- Foreign entity: References to objects in other views (creates relationships)
Key Rule
Entity keys MUST reference dimension names, not database columns:
entities:
- name: order
type: primary
key: order_id
dimensions:
- name: order_id
type: string
expr: order_id
Example Multi-Entity View
entities:
- name: order
type: primary
description: "Individual customer order"
key: order_id
- name: customer
type: foreign
description: "Customer who placed the order"
key: customer_id
- name: restaurant
type: foreign
description: "Restaurant fulfilling the order"
key: restaurant_id
dimensions:
- name: order_id
type: string
description: "Unique order identifier"
expr: order_id
- name: customer_id
type: string
description: "Customer identifier"
expr: customer_id
- name: restaurant_id
type: string
description: "Restaurant identifier"
expr: restaurant_id
Always declare a foreign entity for every UUID/FK column
Whenever a fact-style view has a UUID/FK column pointing at another view's
primary key (e.g. orders.restaurant_id → restaurants.guid), declare a
type: foreign entity on the fact view with the same name: as the
lookup view's primary entity, and key: set to the FK dimension. This is
what makes downstream semantic_query calls able to dimension on the
human-readable name from the lookup view (e.g. restaurants.location_name)
instead of the raw UUID.
Without the foreign-entity declaration, dashboards built on top of the view
can only reference the FK column directly and end up rendering opaque UUIDs
on chart axes and table rows.
Dimension Patterns
Time Dimensions
Extract date parts for temporal analysis:
dimensions:
- name: order_date
type: date
description: "Date when order was placed"
expr: order_date
- name: order_year
type: number
description: "Year of order"
expr: "EXTRACT(YEAR FROM order_date)"
synonyms: ["year"]
- name: order_month
type: number
description: "Month of order (1-12)"
expr: "EXTRACT(MONTH FROM order_date)"
synonyms: ["month"]
- name: order_quarter
type: number
description: "Quarter of order (1-4)"
expr: "EXTRACT(QUARTER FROM order_date)"
Categorical Ranges
Bin continuous values:
dimensions:
- name: price_range
type: string
description: "Price category"
expr: |
CASE
WHEN price < 100 THEN 'Budget'
WHEN price BETWEEN 100 AND 500 THEN 'Mid-Range'
ELSE 'Premium'
END
samples: ["Budget", "Mid-Range", "Premium"]
synonyms: ["price category", "price tier"]
Boolean Flags
dimensions:
- name: is_holiday
type: boolean
description: "Whether date is a holiday"
expr: is_holiday
samples: ["true", "false"]
synonyms: ["holiday", "holiday flag"]
samples is always a list of strings, even when the dimension type is
boolean or number. Write samples: ["true", "false"] and
samples: ["129.99", "89.50"], never bare literals like [true, false] or
[129.99, 89.50]. Bare booleans/numbers fail YAML deserialization
(invalid type: boolean \true`, expected a string`) and break the entire
semantic layer load.
Measure Patterns
Note: Measures can have filters applied directly in their definition using the filters property. Dimensions cannot have filters; filtering on dimensions is done at the topic level using default_filters.
Basic Aggregations
measures:
- name: total_orders
type: count
description: "Total number of orders"
synonyms: ["order count", "number of orders"]
- name: total_revenue
type: sum
description: "Total revenue from all orders"
expr: order_amount
synonyms: ["total sales", "gross revenue", "sales"]
- name: average_order_value
type: average
description: "Average order value"
expr: order_amount
synonyms: ["avg order value", "AOV"]
Filtered Measures (conditional aggregation)
To aggregate a field over a subset of rows ("sum / average / count X
where Y"), write a typed measure (sum, average, count, …) with expr
plus a filters: list. Each filter expr is a raw SQL boolean condition over
the table's columns — exactly what you'd put in a WHERE clause. This is the
correct, re-aggregatable pattern for conditional aggregation; do not express
it as a type: custom CASE WHEN (see the warning below).
measures:
- name: holiday_revenue
type: sum
description: "Revenue during holiday periods only"
expr: order_amount
filters:
- expr: "is_holiday = true"
- name: completed_orders
type: count
description: "Number of completed orders"
filters:
- expr: "status = 'completed'"
- name: high_value_orders
type: count
description: "Orders over $1000"
filters:
- expr: "order_amount >= 1000"
Never wrap a column in {{ }} inside a measure or filter expr. A
{{column}} placeholder is not substituted there — it leaks into the
compiled SQL literally (e.g. {{"view"."col"}}), and the warehouse rejects the
query with syntax error at or near "{". This applies to filter expr and
custom measure expr alike. Always write bare column SQL — Fuel_Price > 3.5,
never {{fuel_price}} > 3.5. A filter expr references the underlying
columns (same as a SQL WHERE), not dimension names. (Brace references like
{{entity.field}} exist only for advanced cross-view measures and have no
place in ordinary same-view filters or aggregations.)
Custom Calculations
Reserve type: custom for metrics a typed measure + filters cannot
express — ratios, correlations, cross-period math. For plain "aggregate X where
Y", use a filtered measure (above), not a custom CASE WHEN: the filtered
form is clearer and re-aggregates through pre-aggregations, whereas custom
(like median and bare number) always falls back to the warehouse. Inside a
custom expr, write bare column names — the same no-{{ }} rule as
filters.
measures:
- name: revenue_growth_rate
type: custom
description: "Year-over-year revenue growth percentage"
expr: |
(SUM(CASE WHEN year = 2024 THEN revenue END) -
SUM(CASE WHEN year = 2023 THEN revenue END)) /
NULLIF(SUM(CASE WHEN year = 2023 THEN revenue END), 0) * 100
- name: temp_sales_correlation
type: custom
description: "Correlation between temperature and sales"
expr: "CORR(temperature, sales)"
Date Columns: Detect Format, Then Cast
A date dimension MUST be type: date (or type: datetime) so the semantic
layer compiles filters as date literals. The expression must produce a real
Date / DateTime, not the raw underlying column. A mismatch produces a
TYPE_MISMATCH at filter time — silent at view-creation, trips the first
analytics query that filters on the dimension.
Recognition heuristic
Any column whose business meaning is a date or timestamp (*_date, *_at,
business_date, order_date, created, updated, event_time, …) where
the underlying SQL type is not already Date / DateTime / TIMESTAMP
needs a wrapping cast. Match by the column's business meaning, not its
physical type.
Sample first, then cast
Stored formats vary across warehouses and across tables within the same
warehouse — ISO strings ("2024-01-31"), compact strings ("20240131"),
date integers (20240131), Unix epoch seconds, milliseconds, and so on.
Run SELECT <col> FROM <table> LIMIT 1 to see what's actually there before
deciding the cast. Guessing is cheap to get wrong (silent failure at query
time) and cheap to verify (one tool call).
Cast functions per warehouse
Combine these with the format you sampled. The dimension is type: date
(or datetime); the expr: wraps the column.
| Warehouse | Functions |
|---|
| ClickHouse | toDate(<col>), toDateTime(<col>), parseDateTimeBestEffort(<col>), toDate(toString(<col>)) for integer-encoded dates |
| BigQuery | CAST(<col> AS DATE), PARSE_DATE('<format>', <col>), PARSE_TIMESTAMP('<format>', <col>), TIMESTAMP_SECONDS(<col>) |
| Snowflake | TO_DATE(<col>[, '<format>']), TO_TIMESTAMP(<col>), TRY_TO_DATE(<col>) |
| Postgres | (<col>)::date, (<col>)::timestamp, to_date(<col>, '<format>'), to_timestamp(<col>, '<format>') |
| DuckDB | CAST(<col> AS DATE), strptime(<col>, '<format>'), epoch_ms(<col>) |
For columns already typed Date / DateTime / TIMESTAMP, no cast is
needed — expr: <col> is enough.
Examples
- name: business_date
type: date
expr: "toDate(toString(business_date))"
- name: order_date
type: date
expr: "PARSE_DATE('%Y-%m-%d', order_date)"
- name: created_at
type: datetime
expr: created_at
Pre-Aggregations (Rollups)
Pre-aggregations ("rollups") are materialized partial aggregates of a view. The
semantic engine builds them into local Parquet files and serves matching
queries from the rollup instead of scanning the warehouse — turning a scan of
millions of rows into a read of a few hundred. A background worker keeps them
fresh. Rollups are optional; the layer works fine without them. Add them to
views that back dashboards or frequently-asked KPI questions.
View-level syntax
Add a pre_aggregations: list to a .view.yml. Each rollup names a subset of
the view's dimensions and measures, optionally a time_dimension rolled up
to a granularity:
name: orders
datasource: local
table: "orders.csv"
refresh_key:
every: "1h"
pre_aggregations:
- name: orders_by_month
dimensions:
- order_status
measures:
- total_orders
- total_order_value
time_dimension: order_date
granularity: month
- name: orders_summary
measures:
- total_orders
- total_order_value
refresh_key:
every: "6h"
dimensions / measures reference names defined in the same view, never
raw columns. time_dimension references a date/datetime dimension;
granularity truncates it. A rollup with no dimensions and no
time_dimension materializes a single summary row.
Refresh keys
refresh_key decides when a rollup is rebuilt. Set exactly one key:
every: "1h" — rebuild on a fixed interval (30m, 6h, 1d, …).
sql: "SELECT MAX(updated_at) FROM orders" — rebuild when the SQL result
changes. The worker re-runs this against the warehouse every cycle, so keep
it cheap (a MAX on an indexed column).
A view-level refresh_key is the default for all of that view's rollups; a
per-rollup refresh_key overrides it for that rollup only.
When a rollup serves a query
The engine uses a rollup for a query only if the rollup covers it:
- every requested dimension is in the rollup's
dimensions
- every requested measure is in the rollup's
measures
- every filtered dimension is in
dimensions or is the time_dimension
- the time dimension matches, and the requested granularity is the same or
coarser than the rollup's (a
month rollup serves month/quarter/year
queries, not day)
If nothing covers the query, it falls back to warehouse SQL — correct, just not
accelerated. Design rollups to match the dimension/measure sets your dashboards
actually request.
Measures that cannot be rolled up: custom, median, and bare number
measures are not re-aggregatable — a query touching one falls back to the
warehouse even if it is listed in a rollup. count, count_distinct, sum,
average, min, and max re-aggregate correctly.
If a view has no pre_aggregations: block, the engine generates one
default rollup covering all dimensions × all measures at day granularity.
Define explicit rollups to control the grain and avoid an oversized default.
Config: enabling the build/refresh worker
Pre-aggregations need a top-level pre_aggregations: block in config.yml so
the background refresh worker runs:
pre_aggregations:
schema: AIRLAYER
database: local
refresh_worker:
enabled: true
heartbeat: "30s"
renewal_threshold: "120s"
All fields are optional; the defaults shown above apply when omitted.
Validate
oxy build compiles and validates rollup definitions along with the rest of
the layer — rollup names and dimension/measure/time references are checked
there. After adding pre-aggregations, run oxy build and confirm it passes.
Topic File Structure
Topics organize views by business domain. Best practice: create one topic per view.
name: topic_name
description: "Business domain this topic covers"
base_view: primary_view_name
views:
- view_name
default_filters:
- field: "status"
not_in:
values: ["cancelled", "test"]
- field: "is_deleted"
eq:
value: false
Filter Types
Filters use a nested structure with the operator as a key:
- Scalar filters (
eq, neq, gt, gte, lt, lte): Use value field
- field: "amount"
gt:
value: 0
- Array filters (
in, not_in): Use values field (array)
- field: "status"
not_in:
values: ["cancelled", "test"]
- Date range filters (
in_date_range, not_in_date_range): Use from and to fields
- field: "order_date"
in_date_range:
from: "2023-01-01"
to: "2023-12-31"
Building Process
Step 1: Generate Schemas
oxy sync
This reads database connections from config.yml and writes schema/dimension data to
semantics.yml in the project root. Read this file to discover table names and columns.
Note: Older oxy versions wrote to .databases/ instead. If .databases/ exists,
read it. Otherwise read semantics.yml.
Step 2: Analyze Schemas
Read the schema files to understand:
- Table structures and column types
- Primary/foreign key relationships
- Column naming patterns
- Business context from table/column names
Step 3: Create View Files
For each table or logical data model:
- Define primary entity (the main subject)
- Add foreign entities (relationships to other tables)
- Create dimensions (filterable/groupable attributes)
- Direct column mappings
- Calculated fields (date parts, ranges)
- Add synonyms for natural language
- Create measures (aggregatable metrics)
- Basic aggregations (sum, count, average)
- Filtered measures (segment-specific)
- Custom calculations (ratios, correlations)
- Document thoroughly (descriptions, samples, synonyms)
Step 4: Create Topic Files
Create one topic per view to organize by business domain.
Step 5: Validate
DB_URL=$(grep OXY_DATABASE_URL .env 2>/dev/null | cut -d= -f2- || echo "postgresql://postgres:postgres@localhost:15432/oxy")
DB_HOST=$(echo $DB_URL | sed 's|.*@\([^:]*\):\([0-9]*\).*|\1|')
DB_PORT=$(echo $DB_URL | sed 's|.*@[^:]*:\([0-9]*\).*|\1|')
nc -z $DB_HOST $DB_PORT 2>/dev/null && echo "PostgreSQL reachable" || echo "ERROR: PostgreSQL not reachable at $DB_HOST:$DB_PORT — run 'oxy start --enterprise' in a separate terminal first"
OXY_DATABASE_URL=$DB_URL oxy build
Always run oxy build after creating or editing view or topic files — this is a mandatory
final step, do not consider the task complete until it passes. It processes .view.yml and
.topic.yml files, compiles them, and reports errors.
Do NOT use oxy validate on view or topic files. oxy validate is for workflow, agent,
and app files only. Running it on view/topic files will not catch errors and may report
misleading results.
oxy build validates:
- Entity/dimension/measure references across views
- SQL expression syntax
- Topic-to-view relationships
Step 6: Test
oxy semantic-engine --dev-mode
Test natural language queries to verify the semantic layer generates correct SQL.
Quality Guidelines
Naming Conventions
- Use
snake_case for all names
- Be descriptive and business-friendly
- Avoid abbreviations unless universally understood
- Use consistent patterns (e.g.,
total_*, avg_*, *_count)
Descriptions
- Write clear, business-friendly explanations
- Explain what the field represents, not how it's calculated
- Include units where relevant (e.g., "in USD", "in minutes")
Synonyms
Add synonyms for:
- Common abbreviations (e.g., "AOV" for "average order value")
- Alternative phrasings (e.g., "revenue" vs "sales")
- Domain-specific terms (e.g., "check" vs "bill" in restaurants)
Samples
Provide example values for:
- Categorical dimensions
- Boolean flags
- Any field where example values aid understanding
Common Issues
Entity Key Not Found
Error: "Entity key 'X' not found in dimensions"
Cause: Entity key references a column name instead of a dimension name
Fix: Ensure entity key references a dimension name:
entities:
- name: order
key: order_id
dimensions:
- name: order_id
expr: order_id
View Not Found
Error: "View 'X' not found"
Cause: View file not in correct location or name mismatch
Fix:
- Ensure file is in
semantics/views/ directory
- Verify filename is
*.view.yml
- Check that
name field matches topic reference exactly
SQL Generation Errors
Error: Generated SQL fails or is incorrect
Cause: Invalid SQL in expr fields
Fix:
- Verify column names exist in the table
- Check SQL syntax for your database dialect
- Ensure data types match (don't sum strings, don't average dates)
- Test expressions with
--dry-run
Joins Not Working
Error: Can't query across multiple views
Cause: Entity names don't match or keys are incorrect
Fix:
- Use identical entity names across related views
- Verify entity keys reference the correct dimensions
- Ensure dimensions exist with matching names
Type Mismatch at Filter Time
Error: TYPE_MISMATCH when an analytics query filters on a date-like
dimension (silent at oxy build, fails on the first filter query)
Cause: A date dimension was declared type: number or type: string over
a column whose underlying SQL type is not Date / DateTime / TIMESTAMP.
The semantic layer compiles a date literal that the column type rejects.
Fix:
- Sample one row of the column:
SELECT <col> FROM <table> LIMIT 1.
- Switch the dimension to
type: date (or datetime).
- Wrap
expr: with the appropriate cast from the
"Date Columns: Detect Format, Then Cast" section above.
Rollup Not Being Used
Symptom: A query that should be fast still scans the warehouse.
Cause: No rollup covers the query — a requested dimension/measure is
missing from the rollup, the requested granularity is finer than the stored
one, or the query touches a custom / median / number measure.
Fix: Compare the dashboard's dimension/measure set against the rollup
definition. Add the missing members, or store the rollup at a finer
granularity. See "When a rollup serves a query" above.
DeepWiki Fallback
For Oxy features not covered here, query DeepWiki with:
"I am a user of this project, not its maintainer. Please prioritize looking at the project docs, examples and json-schemas to answer my question: [your question]"
Only search oxy-hq/oxy repository.
Documentation Links
Examples
Complete View Example
name: restaurant_orders
description: "Restaurant order transactions with customer and item details"
datasource: "clickhouse"
table: "restaurant_analytics.orders"
entities:
- name: order
type: primary
description: "Individual restaurant order"
key: order_id
- name: customer
type: foreign
description: "Customer who placed the order"
key: customer_id
- name: restaurant
type: foreign
description: "Restaurant fulfilling the order"
key: restaurant_id
dimensions:
- name: order_id
type: string
description: "Unique order identifier"
expr: order_id
- name: customer_id
type: string
description: "Customer identifier"
expr: customer_id
- name: restaurant_id
type: string
description: "Restaurant identifier"
expr: restaurant_id
- name: order_date
type: datetime
description: "Timestamp when order was placed"
expr: order_timestamp
- name: order_year
type: number
description: "Year order was placed"
expr: "EXTRACT(YEAR FROM order_timestamp)"
synonyms: ["year"]
- name: order_status
type: string
description: "Current status of the order"
expr: status
samples: ["pending", "preparing", "ready", "completed", "cancelled"]
synonyms: ["status", "order state"]
- name: is_delivery
type: boolean
description: "Whether order is for delivery"
expr: "order_type = 'delivery'"
synonyms: ["delivery order", "delivery"]
measures:
- name: total_orders
type: count
description: "Total number of orders"
synonyms: ["order count", "number of orders"]
- name: total_revenue
type: sum
description: "Total revenue from orders"
expr: order_amount
synonyms: ["total sales", "revenue", "sales"]
- name: average_order_value
type: average
description: "Average order value in USD"
expr: order_amount
synonyms: ["AOV", "avg order value"]
- name: delivery_revenue
type: sum
description: "Revenue from delivery orders only"
expr: order_amount
filters:
- expr: "order_type = 'delivery'"
Complete Topic Example
name: restaurant_orders
description: "Restaurant order analytics covering sales, customer behavior, and fulfillment"
base_view: restaurant_orders
views:
- restaurant_orders
default_filters:
- field: "order_status"
not_in:
values: ["cancelled", "test"]
Important: Do NOT Add yaml-language-server Schema Comments
NEVER add # yaml-language-server: $schema=... comments to view or topic files.
- Oxy does not publish JSON schemas for view.yml or topic.yml files at predictable URLs
- Adding non-existent schema URLs causes IDE validation errors and confusion
- If you're unsure whether a schema URL exists, don't include it
- Only add schema comments if you have verified the URL returns a valid JSON schema