ワンクリックでManusで任意のスキルを実行

$pwd:

schema-enricher

Name: Schema Enricher
Author: mozilla

// Use this skill to enrich schema.yaml files for BigQuery tables in the bigquery-etl repository. Handles creating schema.yaml when it doesn't exist, finding and filling missing column descriptions (from base schemas, upstream source schema, query context, or application context), validating columns against the query, and generating a summary with recommendations for where to add new descriptions (global.yaml, <dataset_name>.yaml, or app_<name>.yaml). Works with column-description-finder skill.

Manusで実行

$ git log --oneline --stat

stars:9

forks:1

updated:2026年4月23日 19:26

ファイルエクスプローラー

3 ファイル

SKILL.md

readonly

related-skills.json

同じリポジトリ

base-schema-audit.md

from "mozilla/bigquery-etl-skills"

Use this skill to audit tables for missing column descriptions and classify each missing column into the correct base schema promotion target (global.yaml, app_<product>.yaml, or <dataset_name>.yaml). Accepts a dataset name and an optional table filter — omit the filter to audit all tables in the dataset. Outputs a per-column recommended_target report for use in _missing_metadata.yaml. Composable with schema-enricher (Step 6).

2026-04-239

column-description-finder.md

from "mozilla/bigquery-etl-skills"

Use this skill when looking up, auditing, or managing column descriptions from global, application-specific, and dataset-specific column definition YAML files (bigquery_etl/schema/global.yaml, bigquery_etl/schema/app_<name>.yaml, and bigquery_etl/schema/<dataset>.yaml). Use it to find a description for a specific column, list all columns in a base schema, audit which columns in a table's schema.yaml are covered by base schemas, or identify columns missing descriptions. Works with schema-enricher skill.

2026-04-239

create-pr.md

from "mozilla/bigquery-etl-skills"

Use this skill when the prompt asks to create, open, or submit a pull request in the bigquery-etl repository. Handles branch creation, staging, committing, pushing, and opening a draft PR with a structured description. Triggered by phrases like "create a PR", "open a PR", "submit a PR", "push and open a PR".

2026-04-239

glean-description-lookup.md

from "mozilla/bigquery-etl-skills"

Use this skill when looking up field descriptions for Mozilla Glean telemetry tables (tables ending in _live or _stable, e.g. <app>_stable.<ping>_v1). Fetches descriptions from the Glean Dictionary (dictionary.telemetry.mozilla.org) using WebFetch with targeted field extraction — only the fields referenced in query.sql, never the full table schema.

2026-04-239

metadata-manager.md

from "mozilla/bigquery-etl-skills"

Use this skill when creating or updating DAG configurations (dags.yaml), schema.yaml, and metadata.yaml files for BigQuery tables. Handles creating new DAGs when needed and coordinates test updates when queries are modified (invokes sql-test-generator as needed). Works with bigquery-etl-core, query-writer, and sql-test-generator skills.

2026-04-239

schema-readme-generator.md

from "mozilla/bigquery-etl-skills"

Use this skill to create or update README.md files for BigQuery ETL tables in the mozilla bigquery-etl repository. Follows layout conventions derived from comparing README files across the repo — rich style with emoji headings, Mermaid data flow diagram, graduated example queries, and concise metadata overview table. Requires schema.yaml with complete descriptions (run schema-enricher first if needed) and a complete metadata.yaml.

2026-04-239

package.json

"author": "mozilla"

"repository": "mozilla/bigquery-etl-skills"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

データベースアーキテクトコンピュータ・数学職15-1243L4

name

schema-enricher

description

Use this skill to enrich schema.yaml files for BigQuery tables in the bigquery-etl repository. Handles creating schema.yaml when it doesn't exist, finding and filling missing column descriptions (from base schemas, upstream source schema, query context, or application context), validating columns against the query, and generating a summary with recommendations for where to add new descriptions (global.yaml, <dataset_name>.yaml, or app_<name>.yaml). Works with column-description-finder skill.

Schema Enricher

Composable: Orchestrates column-description-finder (for finding descriptions) When to use: Creating a new schema.yaml, filling missing column descriptions, validating schema columns against the query output, or auditing an existing schema for completeness

🚨 REQUIRED READING - Start Here

BEFORE performing any schema work, review the following:

Column description sources — note: Base schema files (global.yaml, app_<name>.yaml, dataset-specific YAML) are fetched live from GitHub by the column-description-finder skill in Step 0c. Do not read these files inline — wait for the skill invocation in Step 0c.
Schema YAML conventions: READ references/schema_yaml_conventions.md
- Field types, modes, description quality standards
- Canonical field names and how aliases work
Result file format: READ assets/missing_descriptions_template.yaml
- Template for capturing columns whose descriptions were derived from upstream source schema, query context, or application context (i.e., not found in a base schema)
- Used to recommend additions to base schema files

Workflow

Step 0: Audit base schema coverage using `column-description-finder` ⚠️ MANDATORY — do not skip

BEFORE looking up any column descriptions, invoke the column-description-finder skill to audit base schema coverage. The skill fetches live base schema files from GitHub and uses deterministic scripts — avoiding the hallucination risk of reading files inline.

0a — Identify the applicable schemas:

If metadata.yaml exists, check for an app_schema: field (e.g., app_newtab). If absent, proceed without an app schema flag.
Note the dataset name for dataset-specific schema lookup.

0b — Confirm schema.yaml and query file:

Check for query.sql and read it to understand SELECT output columns:
- If query.sql exists: read it ✓
- If only query.py exists: note it — Step 3 column validation will be skipped since column output cannot be reliably parsed from Python
Check whether schema.yaml exists:
- If it exists: proceed to Step 0c
- If it does not exist: generate it first:
```
./bqetl query schema update <dataset>.<table>
```
  Then read the generated schema.yaml before proceeding to Step 0c.

0c — Invoke column-description-finder skill:

Run the audit script with the appropriate flags:

# With app schema:
python scripts/audit_base_schema_coverage.py <dataset>.<table> --app-schema <app_schema> --dataset-schema

# Without app schema:
python scripts/audit_base_schema_coverage.py <dataset>.<table> --dataset-schema

Capture the full output — it identifies:

Columns covered by base schemas (and which file covers each)
Columns with existing descriptions not in base schemas
Columns with no description at all

⚠️ Wait for the skill to complete and capture its full output before proceeding to Step 1.

Step 1: Categorize columns from audit output

Using the audit output from Step 0c, categorize every column:

Column status	Source	Action
Covered by live base schema	audit output (column → source file)	Use the base schema description
Has own description	audit output	Retain existing description
No description (not in live schemas)	audit output	Check local base schema files first; if found → `local_only_base_schema_columns`; if not → flag for Step 2

Checking local base schema files: The column-description-finder audit fetches schemas live from GitHub. If a base schema file exists locally but not on GitHub main (returns 404 when WebFetched), the audit will not find descriptions from it. For columns marked "no description" by the audit:

Check local bigquery_etl/schema/*.yaml files for a match
If found — the description is valid but sourced from a local-only file. Tag the column as local-only and capture it in local_only_base_schema_columns in Step 6.
If not found in any local file either — flag for description fill in Step 2.

Base schema priority order applied by the audit script is defined in skills/column-description-finder/references/column_definition_yaml_guide.md (app-specific → dataset-specific → global).

Proceed to Step 2 for all columns with no description in any base schema (live or local).

Step 2: Fill missing descriptions

For each column without a current description in schema.yaml, use this priority order:

1–3. Base schemas — descriptions are already identified in the Step 0c audit output. Apply them directly from the audit results (app-specific → dataset-specific → global).

Upstream source schema.yaml — for columns that are pass-through dimensions from a source table:
- Identify the source table(s) from the FROM clause of query.sql
- If the source table ends in _live or _stable (a Glean ingestion table): invoke the glean-description-lookup skill to fetch descriptions from the Glean Dictionary — no local schema.yaml exists for those tables
- Otherwise: locate the source table's directory under sql/ and read its schema.yaml
- If the column exists and has a description, copy it directly
- If the upstream schema.yaml is absent or the column has no description there, fall through to priority 5
Query context — examine query.sql to understand what the column computes:
- Aggregation of clicks → "Total number of clicks recorded"
- Boolean flag → "Whether [feature] is enabled for this user"
- SAFE_DIVIDE expression → rate or ratio description
Application context — if no query context is clear, derive description from:
- Column name semantics
- Dataset/product area (e.g., newtab, pocket, topsites, search)
- Related columns nearby in the schema

⚠️ For any column whose description came from upstream source schema.yaml (priority 4), query context (priority 5), or application context (priority 6) — i.e., not from a base schema — capture it in <table_name>_missing_metadata.yaml (Step 6).

Nested RECORD fields

Base schema matching covers top-level fields only (as noted in the Step 0c audit output). For nested fields within RECORD types, apply the same priority order as above — upstream source schema.yaml first, then query context, then application context. Descriptions derived from priorities 4, 5, or 6 for nested fields should also be captured in <table_name>_missing_metadata.yaml. Quality check (Step 4) applies to nested field descriptions as well.

Step 3: Validate columns

Compare columns in schema.yaml against the query's SELECT output. Skip this step if only query.py exists (noted in Step 0b).

⚠️ This command may update schema.yaml on disk. Do NOT re-read schema.yaml after running — continue working from the field list and descriptions established in Steps 1–2. Use the command output (diff) only to identify structural additions and removals; apply those changes to your in-memory enriched schema, then write everything in Step 5.

./bqetl query schema update <dataset>.<table>

Review the diff to identify:

Columns in query but missing from schema.yaml → add them with descriptions (check Step 0c audit output first; fall back to Step 2 priority order if not found)
Columns in schema.yaml but not in query → flag for removal (confirm with user before deleting)
Type mismatches → correct to match query output

Step 4: Description quality check

Every description — whether sourced from a base schema, retained from the existing schema.yaml, or derived from context — is verified against:

Not empty or null
Not a restatement of the column name
States what the value represents (count, flag, dimension, rate)
Consistent with data type
Contextually accurate for the product domain

Failing descriptions are corrected automatically.

Step 5: Write schema.yaml

Write the enriched schema.yaml preserving field order, names, types, and modes. Only description: entries are added or updated.

On write failure: attempt write to schema_enriched_draft.yaml in the same directory and report the alternate path.

Then read back the written file and confirm:

All original fields are present
Every field has a non-empty description
No fields renamed or removed
Field count matches pre-write count (plus any additions from Step 3)

Then validate the schema against BigQuery:

./bqetl query schema validate <dataset>.<table>

On verification or validation failure: report which fields are missing/incorrect and attempt automatic remediation.

Step 6: Write `<table_name>_missing_metadata.yaml` (if non-live-base-schema descriptions were needed)

Create this file if ANY of the following is true:

One or more columns required description from upstream source schema.yaml, query context, or application context (Step 2)
One or more columns were matched from a local-only base schema file (Step 1)

Write to:

bigquery_etl/schema/missing_metadata/<table_name>_missing_metadata.yaml

Structure — READ assets/missing_descriptions_template.yaml and COPY its format. The block below is a structural preview only; if it differs from the template file, the template is authoritative:

table: "<project>.<dataset>.<table>"
generated_date: "<YYYY-MM-DD>"

# Section 1: Columns whose descriptions came from upstream source schema, query context, or application context
# (not found in any base schema — live or local)
missing_columns:
  - name: "<column_name>"
    type: "<BigQuery type>"
    mode: "<NULLABLE|REQUIRED|REPEATED>"
    inferred_description: >-
      <Description derived from upstream source schema, query context, or application/product context.>
    inference_basis: "<column name semantics | related column | product domain | data type signal | query context | upstream source schema>"
    recommended_target: "<global.yaml | app_<name>.yaml | <dataset_name>.yaml>"
    recommendation_reason: >-
      <Why this column belongs in the recommended target.>

# Section 2: Columns matched in LOCAL-ONLY base schema files
# (descriptions are valid but sourced from files not yet on GitHub main)
local_only_base_schema_columns:
  - name: "<column_name>"
    type: "<BigQuery type>"
    mode: "<NULLABLE|REQUIRED|REPEATED>"
    description: >-
      <Description sourced from the local-only base schema file.>
    source_file: "<app_<name>.yaml | <dataset_name>.yaml>"
    source_status: "local-only — not on GitHub main"
    recommended_action: >-
      Merge <source_file> to GitHub main to make this description authoritative.

Omit a section entirely if it has no entries.

recommended_target rules for missing_columns — this is also the decision logic for choosing which base schema file to recommend:

Condition	Target
Column useful across many datasets/products	`global.yaml`
Column specific to one product (newtab, pocket, ads)	`app_<product_name>.yaml`
Column specific to one dataset	`<dataset_name>.yaml`

⚠️ Apply product-specificity check first. A column that appears only in one dataset is not automatically a <dataset_name>.yaml candidate — if the column represents a product feature or product-scoped concept, it belongs in app_<product_name>.yaml regardless of which dataset it lives in.

For auditing recommended_target across multiple tables at once, use the base-schema-audit skill (decision tree: skills/base-schema-audit/references/base_schema_classification_guide.md).

If neither condition is met, skip this step and note in the summary: <table_name>_missing_metadata.yaml not created — all columns matched in live base schemas

Step 7: Write summary markdown file ⚠️ MANDATORY — do not skip

Write a summary markdown file to:

bigquery_etl/schema/missing_metadata/<table_name>-metadata-summary.md

Use the Write tool to create this file with the following structure:

# Schema Enricher — Metadata Summary

**Table:** `<project>.<dataset>.<table>`
**Run date:** <YYYY-MM-DD>

---

## Phase Results

| Phase (Step) | Status | Notes |
|---|---|---|
| Discovery (Step 0) | ✅/❌ | Base schemas found |
| Categorization (Step 1) | ✅/❌ | X covered, Y retained, Z flagged |
| Description Fill (Step 2) | ✅/❌ | X from base schemas, Y from upstream schema/query/application context |
| Column Validation (Step 3) | ✅/❌/⏭️ | X/Y match query output (or: skipped — only query.py exists) |
| Quality Check (Step 4) | ✅/❌ | X descriptions pass |
| Write (Step 5) | ✅/❌ | schema.yaml written |
| Verification (Step 5) | ✅/❌ | All fields confirmed |

---

## Base Schema Coverage

| Column | Matched Field | Source File | Source Status | Alias Used |
|---|---|---|---|---|
| <column_name> | <matched_field> | <source_file> | live | no |
| <column_name> | <canonical_name> | <source_file> | live | yes (alias: <column_name>) |
| <column_name> | <matched_field> | <source_file> | local-only | no |
| <column_name> | — | (none) | non-base-schema | — |

---

## Local-Only Base Schema Columns

| Column | Source File | Recommended Action |
|---|---|---|
| <column_name> | <source_file> | Merge <source_file> to GitHub main |

*(Omit this section if all matched base schemas are live)*

---

## Non-Base-Schema Descriptions

| Column | Description | Inference Basis | Recommended Base Schema |
|---|---|---|---|
| <column_name> | <description> | <upstream source schema \| query context \| application context> | <global.yaml \| app_<name>.yaml \| <dataset_name>.yaml> |

*(Omit this section if no non-base-schema descriptions were needed)*

---

## Column Validation Results

_(If Step 3 was skipped because only query.py exists, write: "Column validation skipped — only query.py exists")_

- **Columns added (missing from schema):** [list or "none"]
- **Columns removed (not in query):** [list or "none"]
- **Type corrections:** [list or "none"]

---

## Output Files

- `sql/<project>/<dataset>/<table>/schema.yaml` — enriched with all X descriptions
- `bigquery_etl/schema/missing_metadata/<table_name>_missing_metadata.yaml` — Y non-base-schema descriptions (priorities 4–6) + Z local-only base schema columns
  (or: `<table_name>_missing_metadata.yaml` not created — all columns matched in live base schemas)
- `bigquery_etl/schema/missing_metadata/<table_name>-metadata-summary.md` — this summary file

Step 8: Completion checklist ⚠️ VERIFY before declaring task complete

Before reporting done, confirm all required output files have been written using the Read tool or ls:

sql/<project>/<dataset>/<table>/schema.yaml — every field has a non-empty description
bigquery_etl/schema/missing_metadata/<table_name>-metadata-summary.md — summary markdown written
bigquery_etl/schema/missing_metadata/<table_name>_missing_metadata.yaml — written if any columns required description from priorities 4, 5, or 6 OR came from local-only base schemas; explicitly noted as not created if all columns matched in live base schemas

If any file is missing, create it before proceeding.

Integration with Other Skills

Skill	When to invoke
`column-description-finder`	Always — invoked in Step 0c to audit base schema coverage using live GitHub schema files
`glean-description-lookup`	Step 2 priority 4 — when the upstream source table ends in `_live` or `_stable` (no local schema.yaml)
`base-schema-audit`	Before enriching many tables — classifies missing columns into correct base schema targets across a full dataset
`create-pr`	After enrichment is complete — stages, commits, and opens a draft PR

Key Commands

# Generate schema structure from query output (Step 0b)
./bqetl query schema update <dataset>.<table>

# Validate schema against BigQuery (Step 5)
./bqetl query schema validate <dataset>.<table>

schema-enricher

このリポジトリの他の Skills

このリポジトリの他の Skills

Schema Enricher

🚨 REQUIRED READING - Start Here

Workflow

Step 0: Audit base schema coverage using column-description-finder ⚠️ MANDATORY — do not skip

Step 1: Categorize columns from audit output

Step 2: Fill missing descriptions

Nested RECORD fields

Step 3: Validate columns

Step 4: Description quality check

Step 5: Write schema.yaml

Step 6: Write <table_name>_missing_metadata.yaml (if non-live-base-schema descriptions were needed)

Step 7: Write summary markdown file ⚠️ MANDATORY — do not skip

Step 8: Completion checklist ⚠️ VERIFY before declaring task complete

Integration with Other Skills

Key Commands

Schema Enricher

🚨 REQUIRED READING - Start Here

Workflow

Step 0: Audit base schema coverage using column-description-finder ⚠️ MANDATORY — do not skip

Step 1: Categorize columns from audit output

Step 2: Fill missing descriptions

Nested RECORD fields

Step 3: Validate columns

Step 4: Description quality check

Step 5: Write schema.yaml

Step 6: Write <table_name>_missing_metadata.yaml (if non-live-base-schema descriptions were needed)

Step 7: Write summary markdown file ⚠️ MANDATORY — do not skip

Step 8: Completion checklist ⚠️ VERIFY before declaring task complete

Integration with Other Skills

Key Commands

Step 0: Audit base schema coverage using `column-description-finder` ⚠️ MANDATORY — do not skip

Step 6: Write `<table_name>_missing_metadata.yaml` (if non-live-base-schema descriptions were needed)

Step 0: Audit base schema coverage using `column-description-finder` ⚠️ MANDATORY — do not skip

Step 6: Write `<table_name>_missing_metadata.yaml` (if non-live-base-schema descriptions were needed)