// The core skill for working within the bigquery-etl repository. Use this skill when understanding project structure, conventions, and common patterns. Works with model-requirements, query-writer, metadata-manager, sql-test-generator, and bigconfig-generator skills.
| name | bigquery-etl-core |
| description | The core skill for working within the bigquery-etl repository. Use this skill when understanding project structure, conventions, and common patterns. Works with model-requirements, query-writer, metadata-manager, sql-test-generator, and bigconfig-generator skills. |
Composable: Foundation skill that works with model-requirements, query-writer, metadata-manager, sql-test-generator, and bigconfig-generator skills When to use: Understanding project structure, conventions, common patterns, and finding schema descriptions for construction
The bigquery-etl project manages BigQuery table definitions, queries, and associated metadata for Mozilla. Similar to dbt, the repository maintains query definitions with associated metadata and schemas.
Each table/query typically consists of three files:
query.sql OR query.py - The query definition (SQL or Python)metadata.yaml - Metadata about scheduling, ownership, and dependencies (see metadata-manager skill)schema.yaml - BigQuery schema definition with field types and descriptions (see metadata-manager skill)Note: Most tables use query.sql (~95%). Use query.py for API calls, multi-project queries, or complex Python operations. See query-writer skill for details.
When starting work in bigquery-etl, READ these foundational references:
Naming Conventions: READ references/naming_conventions.md
Dataset Organization: READ references/dataset_naming_conventions.md
Schema Resources: READ references/discovery_resources.md
Privacy Guidelines: READ references/privacy_guidelines.md
sql/{project}/{dataset}/{table_name}/
├── query.sql OR query.py
├── metadata.yaml
└── schema.yaml
See assets/directory_structure_example.txt for detailed examples.
Key principles:
sql/{project}/{dataset}/{table_name}/_v1, _v2, etc.)Priority order for schema lookup during construction:
Local files first: Check sql/*/schema.yaml and metadata.yaml files
Glean Dictionary: For _live and _stable tables
ProbeInfo API: For Glean metric metadata
https://probeinfo.telemetry.mozilla.org/glean/{product}/metricsDataHub MCP: Only as last resort
references/datahub_best_practices.md BEFORE any DataHub queriesSee references/discovery_resources.md for:
Table Names:
clients_daily_event_v1_daily, _hourly, _aggregates, _summaryField Names:
submission_date, client_id, n_total_eventsn_: n_events, n_sessionssubmission_date, client_id, sample_id, normalized_channel, normalized_country_code, app_versionSee references/naming_conventions.md for:
See references/dataset_naming_conventions.md for:
_derived, _external, etc.)Mozilla follows strict data privacy policies:
client_id) not individual identifierstable_type: client_level in metadata.yamlSee references/privacy_guidelines.md for:
submission_dateBrowse available functions: https://mozilla.github.io/bigquery-etl/mozfun/
Common functions:
mozfun.map.get_key() - Extract values from key-value mapsmozfun.norm.truncate_version() - Normalize version stringsmozfun.stats.mode_last() - Statistical mode calculationUDF source code in sql/mozfun/ directory.
Glean is Mozilla's product analytics & telemetry solution, providing consistent measurement across all Mozilla products.
Key concepts:
baseline, events, metrics)Common Glean datasets in BigQuery:
{app_id}.{ping_name} (e.g., org_mozilla_fenix.baseline)See references/glean_overview.md for:
See references/bqetl_cli_commands.md for:
General principles:
See assets/query_structure_example.sql for standard query structure.
Version migration:
_v2 table when making breaking schema changes_v1 running during migration periodFor detailed best practices, see:
bigquery-etl-core serves as the foundation skill that other skills build upon:
This skill is always available and does not need to be explicitly invoked - it provides foundational knowledge that other skills reference.
Real query examples in the repository:
sql/moz-fx-data-shared-prod/mozilla_vpn_derived/users_v1/query.sqlsql/moz-fx-data-shared-prod/telemetry_derived/clients_daily_event_v1/query.sqlsql/moz-fx-data-shared-prod/telemetry_derived/event_events_v1/query.sqlsql/moz-fx-data-shared-prod/monitoring_derived/bigquery_table_storage_v1/query.pysql/moz-fx-data-shared-prod/bigeye_derived/user_service_v1/query.pyFor more examples, explore the sql/moz-fx-data-shared-prod/ directory.
references/discovery_resources.md - Schema description sources (Glean Dictionary, ProbeInfo API, DataHub MCP), priority order for construction, documentation linksreferences/naming_conventions.md - Complete naming patterns for tables, fields, and projectsreferences/dataset_naming_conventions.md - Dataset organization and versioning patternsreferences/privacy_guidelines.md - Mozilla data privacy policies and best practicesreferences/glean_overview.md - Glean SDK concepts and BigQuery dataset structuresreferences/bqetl_cli_commands.md - Key CLI commands and DAG discoveryBEFORE using any DataHub MCP tools (mcp__datahub-cloud__*), you MUST:
references/datahub_best_practices.md - Comprehensive token optimization strategiesassets/query_structure_example.sql - Standard query.sql structure with common patternsassets/directory_structure_example.txt - File organization examples