| name | project-planning |
| description | Create multi-phase project plans for Databricks data platform solutions with Agent Domain Framework and Agent Layer Architecture. Includes interactive Quick Start with key decisions, industry-specific domain patterns, complete phase document templates (Use Cases, Agents, Frontend), Genie Space integration patterns, deployment order requirements, and worked examples. Default acceleration mode plans on top of a completed Gold layer. Workshop mode can also plan from the best available layer (deployed Gold, Gold design YAML, deployed Silver, deployed Bronze, or source schema CSV) and produces a workshop-draft contract for downstream stages. Use when planning any Databricks solution after Gold layer is complete, or in workshop mode after Bronze, Silver, or Gold-design is available. |
| metadata | {"author":"prashanth subrahmanyam","version":"2.0","domain":"planning","role":"orchestrator","pipeline_stage":5,"pipeline_stage_name":"planning","next_stages":["semantic-layer-setup"],"workers":[],"common_dependencies":["databricks-expert-agent","naming-tagging-standards"],"emits":["plans/use-case-catalog.md","plans/manifests/semantic-layer-manifest.yaml","plans/manifests/observability-manifest.yaml","plans/manifests/ml-manifest.yaml","plans/manifests/genai-agents-manifest.yaml","plans/manifests/gold-dependency-manifest.yaml","plans/manifests/source-dependency-manifest.yaml","plans/gold-gap-remediation.md","plans/source-gap-remediation.md"],"reads":["gold_layer_design/yaml/","gold_layer_design/erd_master.md","gold_layer_design/docs/BUSINESS_ONBOARDING_GUIDE.md","data_product_accelerator/context/*.csv"],"supported_modes":["acceleration","workshop"],"default_mode":"acceleration","last_verified":"2026-02-07","volatility":"low","upstream_sources":[]} |
Project Plan Methodology for Databricks Solutions
Planning Mode
Default: Data Product Acceleration — full breadth, all domains, all artifacts, Gold layer required as planning basis. This is the standard behavior described in this entire skill document below.
Workshop mode is available for Learning & Enablement scenarios with hard artifact caps and layer flexibility — it can plan from the best available source layer (Gold, Gold design YAML, Silver, Bronze, or source CSV). Workshop mode is NEVER activated unless the user includes the exact phrase planning_mode: workshop in their prompt.
Mode vs source layer: planning_mode (acceleration | workshop) controls artifact caps and validation strictness. planning_source.selected_layer (gold | gold_design | silver | bronze | source_csv) records which input the plan was derived from and is set automatically by Phase 0 below. Acceleration mode FORCES selected_layer = gold (or gold_design only if explicitly allowed). Workshop mode picks the best available source via the Phase 0 priority order and stamps it onto every manifest.
Mode Detection Rules
- Default is ALWAYS
acceleration. If the user does not explicitly declare workshop mode, use acceleration.
- Workshop mode requires EXPLICIT opt-in. The user must include one of these EXACT phrases:
planning_mode: workshop
"workshop mode"
"use workshop mode"
- Do NOT infer workshop mode from words like "small", "simple", "demo", "limited", "quick", "basic", "training", or "few". These are NOT triggers. A user may want a narrow-scope acceleration plan — that's still acceleration mode with fewer use cases.
- When in doubt, ask. If the user's intent is ambiguous (e.g., "Create a plan for a workshop"), ask: "Would you like full Data Product Acceleration mode (default) or Workshop mode with limited artifacts? To use workshop mode, include
planning_mode: workshop in your request."
- Confirm mode at the start. The first line of any plan output should state the active mode:
**Planning Mode:** Data Product Acceleration (default)
**Planning Mode:** Workshop (explicit opt-in — artifact caps active)
- When workshop mode is activated, read
references/workshop-mode-profile.md for artifact caps, phase scope, and selection criteria. Do NOT read that reference otherwise.
- Propagate mode to manifests. Add
planning_mode: workshop or planning_mode: acceleration to all generated manifest YAML files. Downstream orchestrators seeing workshop MUST NOT expand beyond the listed artifacts via self-discovery.
Overview
Comprehensive methodology for creating multi-phase project plans for Databricks data platform solutions. This skill combines interactive project planning with architectural methodology, including templates, worked examples, and quality standards.
Key Assumption (mode-aware):
- Acceleration mode (default): Planning starts AFTER Bronze ingestion AND Gold layer design are complete. Gold is the required planning basis. These are prerequisites, not phases. Phase 0 will stop with a remediation message if Gold is missing.
- Workshop mode (
planning_mode: workshop): Planning AND deployment are layer-agnostic. Phase 0 selects the highest-fidelity input present from: deployed Gold, Gold design YAML, deployed Silver, deployed Bronze, or a source schema CSV. The selected layer is stamped onto every manifest as planning_source.selected_layer. Workshop manifests built from Silver or Bronze are marked implementation_readiness: workshop_deployable — downstream stages (semantic-layer, observability, ml, genai-agents) deploy directly against the selected layer. Workshop manifests built from a source CSV are marked implementation_readiness: workshop_draft (planning contract only — no live tables to deploy against). requires_gold_promotion is an advisory field; it is recommended for production but never blocks deployment.
When to Use This Skill
Use this skill when:
- Creating architectural plans for Databricks data platform projects
- Building observability, analytics, or monitoring solutions
- Planning multi-artifact solutions (TVFs, Metric Views, Dashboards, Genie Spaces, Alerts, ML Models)
- Developing agent-based frameworks for platform management
- Creating frontend applications for data platform interaction
- Starting a new project after Gold layer is complete
Idempotency Guard (Run FIRST)
Before regenerating plans, detect existing artifacts to avoid clobbering work-in-progress. A common failure mode is regenerating plans/ wholesale on a re-run and overwriting user edits to manifests, addendums, or the Use Case Catalog.
from pathlib import Path
PLANS_DIR = Path("plans")
if PLANS_DIR.exists() and any(PLANS_DIR.iterdir()):
existing = sorted(p.relative_to(".") for p in PLANS_DIR.rglob("*") if p.is_file())
print("Existing plan artifacts detected:")
for p in existing:
print(f" {p} (mtime={Path(p).stat().st_mtime})")
print(
"\nHow would you like to proceed?\n"
" - regenerate (DELETE and rebuild all plan files — destructive)\n"
" - incremental (keep existing files, only emit MISSING artifacts)\n"
" - skip (exit this orchestrator — recommended default)\n"
)
Rules:
- Default is
skip. If the user is silent or ambiguous, assume skip and exit with a summary of existing files.
regenerate must be explicit. Confirm the action ("I will delete N files under plans/ — proceed?") before doing anything destructive.
incremental is the right choice when downstream orchestrators (semantic-layer, observability, ml, genai-agents) reported a missing manifest — only emit the missing manifest, not the whole tree.
Escape flag: Users can set planning_allow_overwrite: true in their prompt to skip the idempotency check (equivalent to choosing regenerate without interactive confirmation).
Quick Start (5 Minutes)
Fast Track: Create Your Project Plan
"Create a phased project plan for {project_name} with:
- Planning assets: {n} tables (Gold/Silver/Bronze depending on what is available)
- Use cases: {use_case_1, use_case_2, use_case_3, etc.}
- Target audience: {executives, analysts, data scientists}
- Agent domains: {domain1, domain2, domain3, domain4, domain5}"
Key Decisions (Answer These First)
| Decision | Options | Your Choice |
|---|
| Agent Domains | Derive from business questions (typically 2-5) | __________ |
| Phase 1 Addendums | TVFs, Metric Views, Dashboards, Monitoring, Genie, Alerts, ML | __________ |
| Phase 2 Scope | AI Agents (optional) or skip | __________ |
| Phase 3 Scope | Frontend App (optional) or skip | __________ |
| Genie Space Count | Based on asset count vs 25-asset limit (see Rationalization) | __________ |
| Agent Architecture | Agents use Genie Spaces (recommended) or Direct SQL | __________ |
| Agent-Genie Mapping | 1:1, consolidated, or unified (based on asset volume) | __________ |
Working Memory Management
This orchestrator spans 3 phases. To maintain coherence without context pollution:
After each phase, persist a brief summary note capturing:
- Phase 1: Domain list with Gold table mappings, addendum selections, business questions per domain, artifact count estimates
- Phase 2: Plan document file paths, cross-references verified, total artifact counts by type
- Phase 3: Manifest file paths (semantic-layer, observability, ml, genai-agents), validation results, summary counts
What to keep in working memory: Current phase's template, domain list + artifact inventory, and previous phase's summary. Discard intermediate outputs — they are on disk. Read templates from assets/templates/ and references just-in-time, not upfront.
Step-by-Step Workflow
Phase 0: Planning Source Discovery (MANDATORY, runs before Phase 1)
This phase decides WHICH layer the plan will be derived from and stamps the answer onto every emitted manifest as planning_source. It runs in both modes; the only difference is which selections are allowed.
Step 0.1 — Inventory available planning inputs
Detect each potential planning source. Record presence/absence in working memory.
from pathlib import Path
from databricks.sdk import WorkspaceClient
def detect_planning_sources(catalog: str, user_schema_prefix: str) -> dict:
"""Return a dict describing every potential planning source that exists.
Priority order (highest fidelity first):
1. deployed_gold — live tables in <catalog>.<prefix>_gold
2. gold_design — gold_layer_design/yaml/*.yaml authored, deployment may or may not be done
3. deployed_silver — live tables in <catalog>.<prefix>_silver
4. deployed_bronze — live tables in <catalog>.<prefix>_bronze
5. source_csv — data_product_accelerator/context/*.csv (last resort)
"""
w = WorkspaceClient()
sources = {}
for layer, schema in (
("deployed_gold", f"{user_schema_prefix}_gold"),
("deployed_silver", f"{user_schema_prefix}_silver"),
("deployed_bronze", f"{user_schema_prefix}_bronze"),
):
try:
tables = list(w.tables.list(catalog_name=catalog, schema_name=schema))
sources[layer] = {"schema": f"{catalog}.{schema}", "table_count": len(tables)} if tables else None
except Exception:
sources[layer] = None
yaml_dir = Path("gold_layer_design/yaml")
if yaml_dir.exists() and any(yaml_dir.glob("*.yaml")):
sources["gold_design"] = {"path": str(yaml_dir), "yaml_count": len(list(yaml_dir.glob("*.yaml")))}
else:
sources["gold_design"] = None
csvs = list(Path("data_product_accelerator/context").glob("*.csv"))
sources["source_csv"] = {"paths": [str(c) for c in csvs]} if csvs else None
return sources
Step 0.2 — Select the planning source by mode
| Mode | Allowed selected_layer values | Selection rule |
|---|
acceleration (default) | deployed_gold, gold_design | Pick deployed_gold if present; else gold_design ONLY when explicitly accepted; else STOP with a Gold-required remediation message. |
workshop | deployed_gold, gold_design, deployed_silver, deployed_bronze, source_csv | Pick the highest-priority source present. Never silently fall through to a lower layer when a higher one exists. |
Acceleration STOP message:
Planning in acceleration mode requires the Gold layer. Run the Gold Layer Design and Setup skills first, or re-run with planning_mode: workshop to plan from a lower layer.
Workshop selection log (must be printed):
Phase 0 — Planning source selected: <selected_layer>
Available: deployed_gold=<bool>, gold_design=<bool>, deployed_silver=<bool>, deployed_bronze=<bool>, source_csv=<bool>
Reason: highest-fidelity available input under workshop mode
Step 0.3 — Derive readiness markers
Compute the readiness fields that every emitted manifest must include. requires_gold_promotion is advisory only — it is a hint for production hardening, never a deployment gate.
def readiness_for(selected_layer: str, mode: str) -> dict:
if selected_layer == "deployed_gold":
return {"implementation_readiness": "gold_ready",
"requires_gold_promotion": False}
if selected_layer == "gold_design":
return {"implementation_readiness": "gold_design_only",
"requires_gold_promotion": False}
if mode != "workshop":
raise SystemExit("Non-Gold planning sources are only allowed in workshop mode.")
if selected_layer in {"deployed_silver", "deployed_bronze"}:
return {"implementation_readiness": "workshop_deployable",
"requires_gold_promotion": False}
if selected_layer == "source_csv":
return {"implementation_readiness": "workshop_draft",
"requires_gold_promotion": False}
raise SystemExit(f"Unknown selected_layer={selected_layer!r}")
Readiness state semantics:
implementation_readiness | When | Downstream behavior |
|---|
gold_ready | Acceleration or workshop on deployed_gold | Full production deploy |
gold_design_only | Acceleration or workshop on gold_design (Gold YAML, no live tables yet) | Deploy after Gold provisioning; live-catalog checks advisory |
workshop_deployable | Workshop on deployed_silver or deployed_bronze | Deploy semantic layer / Genie Spaces directly against the Silver or Bronze schema; Gold promotion is an advisory next step |
workshop_draft | Workshop on source_csv only | Planning contract only; downstream stages stop and ask for at least one live layer |
Step 0.4 — Stamp planning_source onto every manifest
Every manifest emitted by Phases 1–3 (semantic-layer, observability, ml, genai-agents, gold-dependency, source-dependency) MUST carry a top-level block:
planning_source:
selected_layer: deployed_gold | gold_design | deployed_silver | deployed_bronze | source_csv
schema: "<catalog>.<schema>"
source_yaml_dir: "gold_layer_design/yaml"
source_csv_paths: ["data_product_accelerator/context/<file>.csv"]
selected_at: "<ISO-8601 UTC>"
implementation_readiness: gold_ready | gold_design_only | workshop_deployable | workshop_draft
requires_gold_promotion: true | false
Downstream orchestrators (semantic-layer, observability, ml, genai-agents) read these fields:
gold_ready / gold_design_only / workshop_deployable — proceed with deployment against the layer the manifest declares (gold_schema for Gold sources; silver_schema / bronze_schema for workshop deployments on Silver/Bronze).
workshop_draft (only emitted when selected_layer = source_csv) — stop before deployment; the plan is a contract only.
requires_gold_promotion is advisory; it influences messaging, not gating.
Step 0.5 — Mode-aware Phase 1 prerequisites table
The Prerequisites Status table in Phase 1 must reflect the selected source. Layers above the selected one are still valid; layers below it (or absent) are marked N/A or Planned only. See the updated assets/templates/prerequisites-template.md for the dynamic format.
Phase 1: Requirements Gathering
Project Information
| Field | Your Value |
|---|
| Project Name | {project_name} |
| Business Domain | {hospitality, retail, healthcare, finance, etc.} |
| Primary Use Cases | {use_case_1, use_case_2, use_case_3, etc.} |
| Target Stakeholders | {executives, analysts, data scientists, operations} |
Prerequisites Status (filled by Phase 0)
The status of each layer must reflect what Phase 0 detected. Layers above the selected planning source are "✅ Complete"; the selected layer itself is the source the plan was derived from; layers below or absent are N/A or Planned only. Use the dynamic format from assets/templates/prerequisites-template.md.
| Layer | Count | Status (mode-aware) |
|---|
| Bronze Tables | {n} | ✅ Complete / N/A / Planned only |
| Silver Tables | {m} | ✅ Complete / N/A / Planned only |
| Gold Dimensions | {d} | ✅ Complete / Designed only / N/A |
| Gold Facts | {f} | ✅ Complete / Designed only / N/A |
| Selected planning source | — | {planning_source.selected_layer} (from Phase 0) |
Define Agent Domains
Derive domains from your business questions and planning-source table groupings (see Artifact Rationalization Framework). Use Gold table groupings when planning_source.selected_layer is deployed_gold or gold_design; otherwise group on the selected source-layer tables (Silver / Bronze / source CSV entities). Do not force a fixed number — let the data model and use cases determine natural boundaries.
Required Reads (Before Proceeding)
Before defining domains and use cases, you MUST have read these references. Check each off in your Skill Usage Summary:
If a worked example matches your project, treat it as the primary format reference for use case cards and artifact designs — adapt, don't reinvent. "MANDATORY" means read it; note in the Skill Usage Summary if the reference did not change your decisions, but still read it.
Workshop mode: references/workshop-mode-profile.md should already be loaded when planning_mode: workshop was detected — see its Document Scope section, since workshop mode changes artifact counts but NOT which documents to produce.
| Domain | Icon | Focus Area | Key Planning Assets | Est. Business Questions |
|---|
| {Domain 1} | {emoji} | {focus} | {tables from selected layer} | {count} |
| {Domain 2} | {emoji} | {focus} | {tables from selected layer} | {count} |
| ... | ... | ... | ... | ... |
Sizing check: If a domain has < 3 business questions, consider merging it. If two domains share > 70% of their planning assets (Gold/Silver/Bronze tables, depending on planning_source.selected_layer), consolidate.
See Industry Domain Patterns for examples by industry.
Phase 1 Addendum Selection
| # | Addendum | Include? | Artifact Count |
|---|
| 1.1 | ML Models | {Yes/No} | {count} |
| 1.2 | Table-Valued Functions | {Yes/No} | {count} |
| 1.3 | Metric Views | {Yes/No} | {count} |
| 1.4 | Lakehouse Monitoring | {Yes/No} | {count} |
| 1.5 | AI/BI Dashboards | {Yes/No} | {count} |
| 1.6 | Genie Spaces | {Yes/No} | {count} |
| 1.7 | Alerting Framework | {Yes/No} | {count} |
Key Business Questions by Domain
List 5-10 key questions per domain that the solution must answer:
{Domain 1}:
- {Question 1}
- {Question 2}
- {Question 3}
- {Question 4}
- {Question 5}
Use Case Catalog
After defining business questions and selecting addendums, consolidate into a Use Case Catalog — one entry per distinct analytical or operational problem the solution will address. Each use case ties business questions to the planning assets (Gold tables in acceleration; selected-layer tables in workshop) and artifacts that solve them. Use assets/templates/use-case-catalog-template.md for the full format.
| UC# | Use Case Name | Domain | Planning Assets | Artifact Types | Example Question |
|---|
| UC-001 | {Descriptive Name} | {Domain} | fact_*, dim_* (or silver_* / bronze_* in workshop drafts) | TVF, MV, Dashboard | "{Natural language question}?" |
| UC-002 | ... | ... | ... | ... | ... |
Use Case Catalog Rules:
- Every use case MUST include 3-5 business questions phrased in natural language
- Every business question from the domain sections above MUST map to at least one use case
- Every artifact in the addendum summaries MUST trace back to at least one use case question
- Questions should be phrased as stakeholders would ask them (these become Genie benchmark candidates)
- Group related questions into a single use case when they share the same planning assets (Gold tables in acceleration; selected-layer tables in workshop) and grain
See Worked Example: Wanderbricks for 3 fully worked-out use case cards.
Stakeholder Checkpoint: After generating the use case catalog, present the Use Case Summary table to the user for review. Whether to block depends on prompt specificity:
- If the user's prompt listed explicit output steps (e.g., "analyze Gold layer, generate use-case plans, produce manifests"): treat as pre-approval. Include the summary table at the top of your response and proceed without blocking. Note: "Use cases derived from Gold layer analysis — let me know if adjustments are needed before the next pipeline stage."
- If the user's prompt was open-ended (e.g., "create a plan"): pause and ask for confirmation before proceeding to addendum generation.
If the user requests changes after seeing the summary, update the catalog and domain questions before continuing.
Phase 2: Plan Document Generation
Create plan documents using templates in the following order:
- README —
assets/templates/plans-readme-template.md (plan index)
- Prerequisites —
assets/templates/prerequisites-template.md (data layer summary)
- Use Case Catalog —
assets/templates/use-case-catalog-template.md (consolidated use case definitions)
- Phase 1 Master —
assets/templates/phase1-use-cases-template.md (analytics artifacts)
- Addendums (selected in Phase 1):
- TVFs —
assets/templates/phase1-tvfs-template.md
- Alerting —
assets/templates/phase1-alerting-template.md
- Genie Spaces —
assets/templates/phase1-genie-spaces-template.md
- Phase 2 —
assets/templates/phase2-agent-framework-template.md (AI agents)
- Phase 3 —
assets/templates/phase3-frontend-template.md (user interface)
Phase 2 Completion Gate
Before proceeding to Phase 3 (Manifests), verify that ALL selected plan documents exist on disk.
CANONICAL NUMBERING REFERENCE. Every filename in the table below matches assets/addendum-numbering.md — the single source of truth for Phase 1 addendum numbers. If you are adding a new addendum, extend addendum-numbering.md first, then use the new name here. Never invent a number (e.g. the stale phase1-addendum-1.1-dashboards.md is forbidden — dashboards are 1.5-aibi-dashboards.md).
| Document | Template | Required? |
|---|
plans/README.md | plans-readme-template.md | ALWAYS |
plans/prerequisites.md | prerequisites-template.md | ALWAYS |
plans/use-case-catalog.md | use-case-catalog-template.md | ALWAYS |
plans/phase1-use-cases.md | phase1-use-cases-template.md | ALWAYS |
plans/phase1-addendum-1.2-tvfs.md | phase1-tvfs-template.md | If TVFs selected |
plans/phase1-addendum-1.3-metric-views.md | (inline) | If Metric Views selected |
plans/phase1-addendum-1.6-genie-spaces.md | phase1-genie-spaces-template.md | If Genie Spaces selected |
plans/phase1-addendum-1.4-lakehouse-monitoring.md | (inline) | If Monitoring selected |
plans/phase1-addendum-1.5-aibi-dashboards.md | (inline) | If Dashboards selected |
plans/phase1-addendum-1.7-alerting.md | phase1-alerting-template.md | If Alerting selected |
plans/phase1-addendum-1.1-ml-models.md | (inline) | If ML selected |
If any required document is missing, create it from its template before generating manifests. Manifests reference these files in generated_from.plan_addendums — they must exist on disk. Workshop mode does not waive this gate: artifact counts inside each document are capped, but the document set is unchanged.
Phase 2 Step 5 — Emit Source Dependency Manifest (MANDATORY, layer-aware)
Before Phase 3 manifest generation, extract every source-layer table/column referenced across all plan addendums into a single machine-readable manifest. This becomes the contract validated against the live catalog in the next step.
The manifest filename and shape depend on the Phase 0 selected layer:
planning_source.selected_layer | File path | Top-level key |
|---|
deployed_gold or gold_design | plans/manifests/gold-dependency-manifest.yaml | gold_dependencies: |
deployed_silver, deployed_bronze, source_csv (workshop) | plans/manifests/source-dependency-manifest.yaml | source_dependencies: |
Acceleration mode emits ONLY gold-dependency-manifest.yaml (existing behavior). Workshop mode emits whichever file matches its selected layer; when the selected layer is Gold, it emits gold-dependency-manifest.yaml for backward compatibility. Both shapes share the same referenced_by semantics.
planning_mode: acceleration
planning_source:
selected_layer: deployed_gold
schema: "<catalog>.<gold_schema>"
implementation_readiness: gold_ready
requires_gold_promotion: false
generated_from:
plan_addendums:
- plans/phase1-use-cases.md
- plans/phase1-addendum-1.2-tvfs.md
- plans/phase1-addendum-1.3-metric-views.md
- plans/phase1-addendum-1.5-aibi-dashboards.md
- plans/phase1-addendum-1.6-genie-spaces.md
gold_dependencies:
- table: fact_booking_daily
columns: [booking_key, property_key, booking_date, net_revenue, nights]
referenced_by:
- semantic-layer/metric_views/revenue_analytics_metrics.yaml
- semantic-layer/tvfs/get_revenue_by_property
- observability/dashboards/revenue_overview.lvdash.json
- table: dim_property
columns: [property_key, property_name, destination_id, is_current]
referenced_by:
- semantic-layer/metric_views/revenue_analytics_metrics.yaml
summary:
total_tables: 12
total_columns: 84
total_referenced_by: 37
planning_mode: workshop
planning_source:
selected_layer: deployed_silver
schema: "<catalog>.<silver_schema>"
source_csv_paths: []
implementation_readiness: workshop_deployable
requires_gold_promotion: false
generated_from:
plan_addendums:
- plans/phase1-use-cases.md
source_dependencies:
- table: silver_bookings
columns: [booking_id, property_id, booking_date, gross_amount]
referenced_by:
- planning/use_case_cards/revenue_overview.md
summary:
total_tables: 4
total_columns: 22
total_referenced_by: 6
Rules:
- One entry per distinct source table; union all column references from all plan addendums.
referenced_by uses relative artifact paths so downstream fixes can trace artifacts back to the missing column.
- Emit this manifest even when
planning_mode: workshop — the workshop cap applies to artifact counts, not to manifest accuracy.
- The shape is identical to Gold's: only the filename, top-level key (
source_dependencies vs gold_dependencies), and planning_source block change.
Phase 2 Step 6 — Live-Catalog Intersection (STOP / WARN Rule, MANDATORY, mode-aware)
Immediately after emitting the dependency manifest, query the live catalog and cross-reference every table/column reference. Downstream stages all assume the planning source is consistent with the live catalog — catching gaps HERE saves 5+ deploy cycles later.
The validation behavior depends on planning_mode and planning_source.selected_layer:
| Mode + selected layer | Behavior | Artifacts |
|---|
acceleration + deployed_gold | Fail-loud STOP if any gap | Emit plans/gold-gap-remediation.md; raise |
acceleration + gold_design (only when explicitly accepted) | Warn (Gold may not be deployed yet) | Emit plans/gold-gap-remediation.md; do NOT raise |
workshop + deployed_gold | Fail-loud STOP if any gap (same as acceleration) | Emit plans/gold-gap-remediation.md; raise |
workshop + gold_design | Warn | Emit plans/gold-gap-remediation.md; continue |
workshop + deployed_silver / deployed_bronze | Warn | Emit plans/source-gap-remediation.md; continue |
workshop + source_csv | Skip live intersection (no live schema to compare against) | None |
import yaml
from pathlib import Path
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
catalog = "<lakehouse_default_catalog>"
gold_path = Path("plans/manifests/gold-dependency-manifest.yaml")
source_path = Path("plans/manifests/source-dependency-manifest.yaml")
manifest_path = gold_path if gold_path.exists() else source_path
manifest = yaml.safe_load(manifest_path.read_text())
planning_mode = manifest.get("planning_mode", "acceleration")
selected_layer = manifest.get("planning_source", {}).get("selected_layer", "deployed_gold")
target_schema = manifest.get("planning_source", {}).get("schema")
deps_key = "gold_dependencies" if "gold_dependencies" in manifest else "source_dependencies"
if selected_layer == "source_csv":
print("ℹ Skipping live-catalog intersection — selected_layer=source_csv has no live schema.")
else:
schema_only = target_schema.split(".")[-1]
live_cols = (
spark.sql(f"""
SELECT table_name, column_name, full_data_type
FROM {catalog}.information_schema.columns
WHERE table_schema = '{schema_only}'
""").collect()
)
live_index = {}
for row in live_cols:
live_index.setdefault(row.table_name, {})[row.column_name] = row.full_data_type
missing_tables, missing_columns = [], []
for dep in manifest[deps_key]:
tbl = dep["table"]
if tbl not in live_index:
missing_tables.append({"table": tbl, "referenced_by": dep["referenced_by"]})
continue
for col in dep["columns"]:
if col not in live_index[tbl]:
missing_columns.append({"table": tbl, "column": col, "referenced_by": dep["referenced_by"]})
if missing_tables or missing_columns:
remediation_path = Path(
"plans/gold-gap-remediation.md" if deps_key == "gold_dependencies"
else "plans/source-gap-remediation.md"
)
remediation_path.write_text(
f"# {('Gold' if deps_key == 'gold_dependencies' else 'Source')} Dependency Gap Remediation\n\n"
f"The following references in plan addendums do not exist in `{target_schema}`.\n\n"
"## Missing tables\n\n"
+ "\n".join(f"- **{m['table']}** — referenced by {m['referenced_by']}" for m in missing_tables)
+ "\n\n## Missing columns\n\n"
+ "\n".join(
f"- **{m['table']}.{m['column']}** — referenced by {m['referenced_by']}"
for m in missing_columns
)
+ "\n\n## Next steps\n\n"
"1. Acceleration / Gold source: add tables to `gold_layer_design/yaml/` and re-run "
"`gold/01-gold-layer-setup`.\n"
"2. Workshop / Silver or Bronze: extend the corresponding setup skill (silver/bronze) "
"and re-run.\n"
"3. Re-run this Planning skill to regenerate manifests.\n"
)
is_strict = (planning_mode == "acceleration" and selected_layer == "deployed_gold") \
or (planning_mode == "workshop" and selected_layer == "deployed_gold")
msg = (f"{deps_key} gap detected: {len(missing_tables)} missing tables, "
f"{len(missing_columns)} missing columns. See {remediation_path}.")
if is_strict:
raise RuntimeError(msg + " STOP — downstream orchestrators cannot proceed.")
else:
print(f"⚠ {msg} Continuing under non-strict mode/layer; downstream manifests "
f"will carry implementation_readiness=" + manifest.get("implementation_readiness", ""))
else:
print(f"✅ {deps_key} intersected cleanly with live catalog `{target_schema}`.")
Escape flag: If the user has an out-of-band reason to bypass the gap (e.g., Gold is intentionally incomplete for a phased rollout), they can pass planning_allow_gold_gap: true in their prompt. In that case, still emit the remediation file as a warning, but proceed to Phase 3 with a prominent gold_gap_acknowledged: true marker in every downstream manifest. This flag does NOT relax mode-specific behavior; non-Gold workshop manifests already carry implementation_readiness: workshop_deployable (Silver/Bronze) or workshop_draft (source CSV).
Phase 3: Manifest Generation (Plan-as-Contract)
After creating plan documents, generate machine-readable YAML manifests that downstream orchestrators consume as implementation contracts.
Why manifests? The "Extract, Don't Generate" principle applies to the planning-to-implementation handoff. Manifests ensure downstream orchestrators implement exactly what was planned — no missed artifacts, no naming inconsistencies.
MANDATORY: Read the manifest generation guide:
| # | Reference Path | What It Provides |
|---|
| 1 | references/manifest-generation-guide.md | Full manifest workflow, validation, consumption pattern |
Steps:
- Review Gold layer YAML schemas in
gold_layer_design/yaml/
- For each plan addendum, extract the concrete artifact definitions
- Generate 4 YAML manifests using templates from
assets/templates/manifests/:
plans/manifests/semantic-layer-manifest.yaml — TVFs, Metric Views, Genie Spaces
plans/manifests/observability-manifest.yaml — Monitors, Dashboards, Alerts
plans/manifests/ml-manifest.yaml — Feature Tables, Models, Experiments
plans/manifests/genai-agents-manifest.yaml — Agents, Tools, Eval Datasets
- For each artifact in a manifest, add
use_case_refs listing the UC# it implements (from plans/use-case-catalog.md)
- Validate all table/column references exist in Gold YAML
- Verify summary counts match actual artifact counts
- Run
python scripts/validate_use_case_coverage.py plans/use-case-catalog.md to verify coverage
- Commit manifests alongside plan documents
Key principle: Every artifact in a manifest MUST trace back to (a) a Gold layer table and (b) a business question from the plan addendum.
Output Structure:
plans/
├── use-case-catalog.md # Consolidated use case definitions
├── manifests/
│ ├── semantic-layer-manifest.yaml # → consumed by semantic-layer/00-*
│ ├── observability-manifest.yaml # → consumed by monitoring/00-*
│ ├── ml-manifest.yaml # → consumed by ml/00-*
│ └── genai-agents-manifest.yaml # → consumed by genai-agents/00-*
Downstream consumption: Each downstream orchestrator (stages 6-9) has a Phase 0: Read Plan step that reads its manifest. If the manifest doesn't exist (e.g., user skipped Planning), the orchestrator falls back to self-discovery from Gold tables.
Plan Structure Framework
Standard Project Phases
plans/
├── README.md # Index and overview
├── use-case-catalog.md # Consolidated use case definitions
├── prerequisites.md # Bronze/Silver/Gold summary (optional)
├── phase1-use-cases.md # Analytics artifacts (master)
│ ├── phase1-addendum-1.1-ml-models.md
│ ├── phase1-addendum-1.2-tvfs.md
│ ├── phase1-addendum-1.3-metric-views.md
│ ├── phase1-addendum-1.4-lakehouse-monitoring.md
│ ├── phase1-addendum-1.5-aibi-dashboards.md
│ ├── phase1-addendum-1.6-genie-spaces.md
│ └── phase1-addendum-1.7-alerting.md
├── phase2-agent-framework.md # AI Agents
├── phase3-frontend-app.md # User Interface
└── manifests/ # Machine-readable contracts
├── semantic-layer-manifest.yaml # → semantic-layer/00-*
├── observability-manifest.yaml # → monitoring/00-*
├── ml-manifest.yaml # → ml/00-*
└── genai-agents-manifest.yaml # → genai-agents/00-*
Phase Dependencies
Prerequisites (Bronze → Silver → Gold) → Phase 1 (Use Cases) → Phase 2 (Agents) → Phase 3 (Frontend)
[COMPLETE] ↓
All Addendums
Agent Domain Framework
Core Principle
ALL artifacts across ALL phases MUST be organized by Agent Domain. This ensures:
- Consistent categorization across 100+ artifacts
- Clear ownership by future AI agents
- Easy discoverability for users
- Aligned tooling for each domain
Agent Domain Application
Every artifact (TVF, Metric View, Dashboard, Alert, ML Model, Monitor, Genie Space) must:
- Be tagged with its Agent Domain
- Use the domain's planning assets (Gold tables in acceleration; selected-layer tables in workshop)
- Answer domain-specific questions
- Be grouped with related domain artifacts in documentation
Example Pattern:
## {Domain}: get_{metric}_by_{dimension}
**Agent Domain:** {Domain}
**Planning Assets:** `fact_{entity}`, `dim_{entity}` # or `silver_{entity}` / `bronze_{entity}` in workshop drafts
**Business Questions:** "What are the top {metric} by {dimension}?"
See Industry Domain Patterns for domain templates by industry.
Agent Layer Architecture Pattern
Core Principle: Agents Use Genie Spaces as Query Interface
AI Agents DO NOT query data assets directly. Instead, they use Genie Spaces as their natural language query interface. Genie Spaces translate natural language to SQL and route to appropriate tools.
USERS (Natural Language)
↓
PHASE 2: AI AGENT LAYER (LangChain/LangGraph)
├── Orchestrator Agent (intent classification)
└── Specialized Agents (1 per domain)
↓
PHASE 1.6: GENIE SPACES (NL Query Execution)
├── {Domain 1} Intelligence Genie Space
├── {Domain 2} Intelligence Genie Space
└── Unified {Project} Monitor
↓
PHASE 1: DATA ASSETS (Agent Tools)
├── Metric Views (pre-aggregated - use FIRST)
├── TVFs (parameterized queries)
├── ML Predictions (ML-powered insights)
└── Lakehouse Monitors (drift detection)
↓
PREREQUISITES: GOLD LAYER (Foundation)
Deployment Order (Critical!)
Genie Spaces MUST be deployed BEFORE agents can use them.
Phase 1.1-1.5 (Data Assets) → Phase 1.6 (Genie Spaces) → Phase 2 (Agents)
↓ ↓ ↓
Build foundation Create NL interface Consume interface
For detailed architecture, design patterns, "Why Genie Spaces" comparison, and testing strategy, see Agent Layer Architecture.
Artifact Rationalization Framework
MANDATORY: Read references/rationalization-framework.md for complete sizing guides, decision matrices, and naming conventions.
Core Principle: Every artifact must trace to a specific business question. Do not create artifacts to fill quotas.
Critical constraints (always enforce, even without reading the reference):
- Genie Spaces: max 25 assets per space; 10-25 per space is optimal; <10 = merge spaces
- TVFs: only when Metric Views cannot answer the question (requires parameterized multi-table logic)
- Metric Views: one per distinct analytical grain, not per domain
- Domains: emerge from business questions (min 3 questions per domain); merge if >70% planning-asset overlap
- Naming:
get_{domain}_{metric} for TVFs, {domain}_analytics_metrics for Metric Views
SQL Query Standards
ALWAYS use Gold layer tables for production deployable artifacts, NEVER system.* tables directly. Reference pattern: ${catalog}.${gold_schema}.table_name. In workshop deployments built from Silver or Bronze (implementation_readiness: workshop_deployable), SQL does reference ${catalog}.${silver_schema}.* or ${catalog}.${bronze_schema}.* directly — the workshop semantic layer is built on top of those tables. requires_gold_promotion is an advisory flag recommending Gold promotion for production hardening; it does not block workshop deployment.
- Date parameters:
STRING type (Genie compatible), cast at query time: CAST(start_date AS DATE)
- SCD Type 2 joins:
LEFT JOIN dim_{entity} d ON f.{entity}_id = d.{entity}_id AND d.is_current = TRUE
Documentation Quality Standards
LLM-Friendly Comments — All artifacts must include: what it does, when to use it, example questions it answers. Pattern: COMMENT 'LLM: Returns top N {metric}... Example questions: "What are the top 10...?"'
Summary Tables — Every addendum must include: overview table (all artifacts with domain, dependencies, status), by-domain sections, count summary, and success criteria.
Common Mistakes to Avoid
| Mistake | Correct Approach |
|---|
Querying system.* tables directly | Always use Gold layer: ${catalog}.${gold_schema}.fact_* |
| Omitting Agent Domain on artifacts | Every artifact must be tagged: ## {Domain}: get_{metric} |
| Adding a TVF without cross-addendum check | Also consider: Metric View counterpart? Alert? Dashboard? |
Using DATE type in TVF parameters | Use STRING COMMENT 'Format: YYYY-MM-DD' (Genie compatible) |
| Deploying agents before Genie Spaces | Genie Spaces MUST be deployed first — agents consume them |
| Genie Space with 25+ assets | Split by domain cohesion; each space 10-25 assets |
| One Genie Space per domain when assets are thin | Consolidate thin domains (<10 assets) into fewer spaces |
| TVF that duplicates a Metric View | TVFs only when multi-period/multi-table parameterized logic is needed |
| Forcing a fixed domain count | Let business questions determine domains — 2-3 focused > 5-6 thin |
| Counting a Genie Space as artifact coverage for a use-case question | Genie Space is an interface layer, not an implementing artifact. Every question must be answerable by at least one TVF, Metric View, or listed Gold table in the Genie Space's asset list. |
| Inventing new YAML keys when the manifest template doesn't fit | Adapt within the template schema first. If the template is truly insufficient (e.g., unified cross-domain Genie Space), extend the template with a documented key (e.g., unified_genie_space) — never ship ad-hoc schema a downstream consumer doesn't know to look for. |
Reference Files
- Phase Details — Full phase and addendum descriptions with deliverables
- Estimation Guide — Effort estimation, dependency management, risks
- Agent Layer Architecture — Detailed architecture, "Why Genie Spaces" comparison, design patterns, testing strategy, multi-agent query example
- Industry Domain Patterns — Domain templates for Hospitality, Retail, Healthcare, Finance, SaaS, and Databricks System Tables
- Worked Example: Wanderbricks — Complete 101-artifact project example with TVF SQL, Metric View YAML, Alert YAML
- Manifest Generation Guide — Plan-as-contract pattern: how to generate YAML manifests for downstream orchestrators
Assets
Plan Templates
Manifest Templates (Plan-as-Contract)
Validation Checklist
Structure
Content Quality
Cross-References
Use Case Traceability
Completeness
Rationalization (Prevent Bloat)
Agent Layer Architecture (If Phase 2 Included)
Key Learnings
- Agent Domain framework provides consistent organization across all artifacts — every artifact gets a domain tag
- Planning-source layer references only — never query
system.* tables directly. Acceleration uses ${catalog}.${gold_schema}.*. Workshop deployments use ${gold_schema} / ${silver_schema} / ${bronze_schema} based on planning_source.selected_layer
- Cross-addendum updates — user requirements span multiple addendums; update all affected documents
- LLM-friendly comments are critical for Genie/AI/BI integration — include example questions
- Agents use Genie Spaces as abstraction — agents don't write SQL; Genie handles NL-to-SQL translation, optimization, and guardrails
- 1:1 Agent-to-Genie mapping recommended; Orchestrator agent uses Unified Genie Space for intent classification
- Deploy Genie Spaces before agents — three-level testing: assets → Genie → Agents
- Genie Space 25-asset hard limit — plan space count from total asset volume, not domain count; fewer focused spaces > many thin ones
- Rationalize before creating — every artifact must trace to a business question; TVFs only when Metric Views can't answer
- Domains emerge from data — business questions and planning-asset groupings (Gold by default; Silver/Bronze in workshop deployments) determine natural domain boundaries
References
Official Documentation
Related Skills
Agent Framework Technologies
Pipeline Progression
Previous stage (acceleration): gold/01-gold-layer-setup → Gold layer tables and merge scripts should be complete.
Previous stage (workshop): ANY of bronze/00-bronze-layer-setup, silver/00-silver-layer-setup, gold/00-gold-layer-design, or gold/01-gold-layer-setup. Phase 0 picks the highest-fidelity input automatically.
Next stage: After completing the project plan for remaining phases, proceed to:
semantic-layer/00-semantic-layer-setup — Build Metric Views, TVFs, and Genie Spaces on top of the planning source. For Gold sources, deployment runs against gold_schema (production path). For workshop manifests with implementation_readiness: workshop_deployable (Silver/Bronze), deployment runs directly against the selected layer with a quality advisory; Gold promotion is recommended for production. For implementation_readiness: workshop_draft (source CSV), the orchestrator stops because there are no live tables to deploy against.
Post-Completion: Skill Usage Summary (MANDATORY)
After completing all phases of this orchestrator, output a Skill Usage Summary reflecting what you ACTUALLY did — not a pre-written summary.
What to Include
- Every skill
SKILL.md or references/ file you read (via the Read tool), in the order you read them
- Which phase you were in when you read it
- Whether it was a Common, Reference, or Template file
- A one-line description of what you specifically used it for in this session
Format
| # | Phase | Skill / Reference Read | Type | What It Was Used For |
|---|
| 1 | Phase N | path/to/SKILL.md | Common / Reference / Template | One-line description |
Summary Footer
End with:
- Totals: X common skills, Y reference files, Z templates read across N phases
- Manifests emitted: List each manifest file generated and its artifact count
- Skipped: List any expected references or templates that you did NOT need to read, and why
- Unplanned: List any skills you read that were NOT listed in the dependency table (e.g., for troubleshooting, edge cases, or user-requested detours)