with one click
ai-ready-data
// Assess and optimize data for AI workloads across platforms. Scan estates for prioritization, assess assets against profiles, and guide remediation.
// Assess and optimize data for AI workloads across platforms. Scan estates for prioritization, assess assets against profiles, and guide remediation.
| name | ai-ready-data |
| description | Assess and optimize data for AI workloads across platforms. Scan estates for prioritization, assess assets against profiles, and guide remediation. |
Assess data products for AI-readiness and remediate gaps. Each requirement is a self-contained directory with three markdown files per platform: check.md (context + SQL returning a 0–1 score), diagnostic.md (context + detail SQL), and fix.md (context + remediation SQL and/or organizational guidance). Each file co-locates all relevant context — constraints, gotchas, variant selection guidance, and platform-specific notes — directly above the SQL it applies to. The manifest (requirements/requirements.yaml) provides lightweight metadata for profile-load time. Every assessment has exactly six stages named after the six factors of AI-ready data — use these exact names everywhere (reports, plans, tasks): Clean, Contextual, Consumable, Current, Correlated, Compliant.
Three phases, light to deep:
scan profile.1. Platform → user selects platform
2. Discovery → guided, explore, or scan (estate-level)
3. Profile → user picks a profile or selects individual requirements
4. Adjustments → apply overrides (skip/set/add)
5. Coverage → show what's runnable vs N/A before executing
6. Assess → execute checks, score, report
7. Report → present results, offer to save standardized report to filesystem
8. Remediate → platform-specific fixes for failures
Ask the user what platform their data is on. Supported platforms:
snowflakeLoad the platform reference from platforms/ — either platforms/{PLATFORM}.md or platforms/{platform}/{PLATFORM}.md. This is your reference for all platform-specific behavior during this session.
Discovery has three modes: guided (user already knows what to assess), explore (user wants to understand the landscape first), and scan (estate-level sweep for prioritization). Ask the user which fits:
How would you like to scope this assessment?
1. I know which database/schema/tables to assess
2. Help me explore what's available first
3. Scan my data estate for AI readiness
Ask the user:
This establishes the scope for the assessment. No SQL is executed during guided discovery.
Run read-only reconnaissance queries against the platform to help the user understand their data landscape before choosing targets. Present findings progressively — don't dump everything at once.
Step 2a: Database landscape. List accessible databases with table and view counts. On Snowflake, query INFORMATION_SCHEMA.DATABASES or SHOW DATABASES. Present a summary:
Available databases:
ANALYTICS 42 schemas 1,204 tables
RAW_INGESTION 12 schemas 387 tables
ML_FEATURES 5 schemas 89 tables
Ask the user to pick a database (or multiple).
Step 2b: Schema inventory. For the selected database(s), enumerate schemas with structural signals:
ANALYTICS schemas:
Schema Tables Views Dynamic Tables Streams Has Tags
PRODUCT_METRICS 34 12 3 2 yes
USER_BEHAVIOR 28 4 0 0 no
RAW_EVENTS 112 0 0 0 no
Structural signals to surface (platform-dependent):
Step 2c: Narrow scope. Based on what the user sees, help them select the database, schema, and tables for assessment. Users may choose all tables in a schema or pick specific ones.
All reconnaissance queries are read-only. No mutating operations during discovery.
Scan produces a comparative readiness view across many schemas within a database (or across databases). It uses the scan profile — a lightweight subset of requirements that are all schema-scoped and fast to execute.
Step 2s-a: Scope. Ask the user for the database (or account-level). No schema or table selection — the scan covers all schemas.
Step 2s-b: Execute scan profile. Load profiles/scan.yaml. For each schema in the database, run the scan profile's requirements. All checks in the scan profile must be schema-scoped to allow batch execution across many schemas.
Step 2s-c: Present portfolio view. Score each schema and present a comparative ranking:
Estate Scan — {platform} — {DATABASE}
Schema Readiness Contextual Consumable Current Compliant
──────── ───────── ────────── ────────── ─────── ─────────
PRODUCT_METRICS 4.2 4.5 3.0 5.0 4.5
USER_BEHAVIOR 3.1 2.0 3.5 4.0 3.0
RAW_EVENTS 1.4 1.0 1.0 2.5 1.0
{N} schemas scanned. {H} above readiness threshold (≥ 3.5).
Step 2s-d: Drill down. After presenting the scan, offer the user a choice:
Options:
1. Assess a specific schema in depth (pick from the list)
2. Export scan results
3. Done
If the user picks a schema, transition to the full Assess flow (Step 3 onward) with that schema pre-scoped. The scan phase flows naturally into the assess phase.
Ask the user what they want to assess:
Load each profile YAML from profiles/ and count its requirements to present accurate numbers:
What would you like to assess?
1. RAG readiness ({N} requirements)
2. Feature serving readiness ({N} requirements)
3. Training readiness ({N} requirements)
4. Agent readiness ({N} requirements)
5. Full assessment (all {N} requirements)
6. Let me pick specific requirements
If the user picks a built-in profile, load profiles/{name}.yaml.
If the user picks "full assessment," include all requirements from requirements/requirements.yaml with default thresholds of 0.80.
If the user wants to pick specific requirements, dynamically generate the catalog from requirements/requirements.yaml at runtime. Read every entry, group by factor, number sequentially, and present in this format:
Select the requirements you want to assess (comma-separated numbers, or "all" within a factor):
─── {Factor Name} ───
{N}. {requirement_key} {description}
...
─── {Next Factor} ───
...
Use the factor order: Clean, Contextual, Consumable, Current, Correlated, Compliant. Number requirements sequentially across all factors starting at 1.
Users can respond with:
1, 2, 13, 34, 52 (picks those)all Clean, all Compliant (picks every requirement in those factors)1-12, 52-62all Clean, 13, 34-36, 52For each selected requirement, apply a default threshold of 0.80. The user can adjust thresholds in Step 4 (Adjustments).
After loading the profile, offer three adjustment verbs:
skip <requirement> — Exclude a requirement entirely.set <requirement> <threshold> — Override a threshold (e.g., set chunk_readiness 0.70).add <requirement> <threshold> — Include a requirement not in the profile.Before executing, intersect the selected requirements with what the platform can actually run. For each requirement, check if requirements/{key}/{platform}/check.md exists.
Present the coverage summary:
{Profile} Assessment — {platform} — {DATABASE}.{SCHEMA}
Selected: {N} requirements
Runnable: {R}
Not available: {N-R} (no implementation for this platform)
- {requirement_key}: no {platform} implementation
- ...
Proceed?
Checkpoint: User confirms before execution begins.
Load requirements/requirements.yaml once at session start. This manifest provides lightweight metadata: description, factor, scope, and implementations.
For each stage in order (Clean, Contextual, Consumable, Current, Correlated, Compliant), for each requirement:
requirements/requirements.yaml for metadata (scope).requirements/{requirement_name}/{platform}/check.md. This file contains all context (constraints, gotchas, variant guidance) and one or more SQL blocks.{{ placeholder }} values from the user's scope context and the SQL block itself (database, schema, asset, column, plus any requirement-specific values documented in the check file's context section).value result (float 0.0–1.0, where 1.0 is perfect).value >= threshold to determine pass/fail.N/A.SQL blocks within markdown files use {{ placeholder }} syntax for variable substitution. The scope field in the manifest tells you whether the check is schema-scoped, table-scoped, or column-scoped:
database, schema): run once per schema.asset): run per table, aggregate results.column): run per column, aggregate results.value is a float in [0.0, 1.0] where 1.0 is perfect. All requirements are gte-direction: pass when value >= threshold.NULL via NULLIF(denominator, 0). The orchestrator renders NULL as N/A in reports and treats it as neither pass nor fail. Never emit a hard-coded 1.0 or 0.0 fallback — they silently inflate or deflate dashboards.information_schema filters must wrap both sides in UPPER(...) (e.g. UPPER(table_schema) = UPPER('{{ schema }}')) so callers can pass identifiers in any case.LIMIT on account_usage scans (access_history, query_history, …) must be paired with a stable ORDER BY — typically ORDER BY query_start_time DESC or ORDER BY start_time DESC — so repeated runs return the same score.REGEXP_LIKE(col, pattern) for name pattern matching. LIKE '%_X' is unsafe because _ is a single-character wildcard that matches unintended names (e.g. USERAID matches %_ID).Present results in conversation first, then offer to save:
{Profile} Assessment — {platform} — {DATABASE}.{SCHEMA}
{Stage Name} {PASS/FAIL}
"{why}"
{requirement} {value} (need >= {threshold}) {PASS/FAIL}
Summary: {N} of {total} stages passing ({M} of {R} requirements passing)
After presenting results in conversation, offer to write a standardized report file:
Would you like me to save this report to your filesystem?
Default path: ./ai-ready-report-{DATABASE}-{SCHEMA}-{YYYY-MM-DD}.md
(You can specify a different path)
If the user confirms, write the report in the following standardized format. The report must be self-contained — a reader who wasn't present for the conversation should understand the full context, results, and next steps.
Report format:
# AI-Ready Data Assessment
| Field | Value |
|---------------|------------------------------------|
| Date | {YYYY-MM-DD} |
| Platform | {platform} |
| Database | {DATABASE} |
| Schema | {SCHEMA} |
| Tables | {table count} ({list or "all"}) |
| Profile | {profile name or "custom"} |
| Requirements | {selected count} of {manifest count} |
| Runnable | {runnable count} |
## Summary
{N} of {total} stages passing. {M} of {R} requirements passing.
{1-3 sentence narrative: what's strong, what's the biggest gap, what's the highest-ROI fix.}
## Results by Stage
### {Stage Name} — {PASS/FAIL}
> {why}
| Requirement | Score | Threshold | Status |
|-------------|-------|-----------|--------|
| {key} | {val} | {thresh} | {P/F} |
{Repeat for each stage}
## Failing Requirements
{For each failing requirement, one subsection:}
### {requirement_key} — {value} (need >= {threshold})
- **Factor:** {factor}
- **What it measures:** {description from manifest}
- **Scope:** {schema/table/column}
- **Constraints:** {any constraints from manifest, or "None"}
## Recommended Next Steps
{Prioritized list of remediation actions, ordered by impact.
Group by effort level: quick wins first, then medium effort, then larger investments.
Reference specific requirements and their current scores.}
---
*Generated by ai-ready-data assessment. Re-run to track progress.*
If the assessment was run against multiple tables, include per-table breakdowns under each stage where relevant (table-scoped and column-scoped checks).
If the user ran a custom selection (option 6), note which requirements were included and which were excluded under the metadata table.
Checkpoint: Options: remediate (fix gaps), tell-me-more (run diagnostics), done (stop).
When the user wants detail on a failing requirement, read requirements/{requirement_name}/{platform}/diagnostic.md. The file contains context explaining what the diagnostic measures and one or more SQL blocks. Use the context to select the appropriate SQL, substitute placeholders, execute, and present the results. If the file doesn't exist, explain that diagnostics aren't available for this requirement on this platform.
Process failing stages in order. For each stage:
Stage: {Stage Name}
Why: {why}
Failing requirements:
{requirement}: {value} (need >= {threshold})
For each failing requirement:
requirements/{requirement_name}/{platform}/fix.md. This file contains all context (constraints, preconditions, delegation notes) and one or more remediation options — each with its own SQL block and/or organizational guidance.{{ placeholder }} values in the SQL blocks.Show the substituted implementation, affected objects, and any constraints.
Checkpoint: Options: approve (execute), skip (next stage), modify (edit SQL), tell-me-more (diagnostics), abort (stop).
Before executing non-idempotent operations, check the platform reference for idempotency guards. Run the guard first; skip the operation if the desired state already exists.
Skipped guards are not failures — the desired state already exists. Never use CREATE OR REPLACE unless the platform documentation explicitly says it's safe for that operation.
Re-run the platform check implementation for each requirement in the stage. Show before/after:
{Stage Name} — remediation complete
{requirement}:
Before: {old_value}
After: {new_value}
Status: {PASS/FAIL}
Move to the next failing stage. After all stages:
Remediation Complete
Stage Before After
───── ────── ─────
{Stage Name} FAIL PASS
{Stage Name} FAIL PASS
What changed:
{Stage}: {one-line summary}
Overrides are applied in memory for the current run. For repeatability, overrides can be saved as a custom profile using extends:
name: my-rag-profile
extends: rag
overrides:
skip:
- embedding_coverage
set:
chunk_readiness: { min: 0.70 }
add:
row_access_policy: { min: 0.50 }
When loading a profile with extends, first load the base profile, then apply overrides.
platforms/ before executing any operations.N/A with reason.Each requirement has a directory under requirements/ containing platform-specific implementations as markdown files. The manifest (requirements/requirements.yaml) provides lightweight metadata needed at profile-load time. All detailed context — constraints, gotchas, variant selection guidance, platform-specific notes — lives in the markdown files themselves, co-located directly above the SQL they apply to.
| File | Purpose |
|---|---|
requirements.yaml | Single manifest: lightweight metadata (description, factor, scope, implementations) |
{requirement_key}/{platform}/check.md | Context + check SQL (read-only, returns normalized 0–1 score) |
{requirement_key}/{platform}/diagnostic.md | Context + diagnostic SQL (read-only detail drill-down) |
{requirement_key}/{platform}/fix.md | Context + remediation SQL and/or organizational guidance (mutating, requires approval) |
Each file follows this structure:
# {Type}: {requirement_key}
{One-line description}
## Context
{Prose: what it measures, constraints, gotchas, platform-specific notes,
variant selection guidance, preconditions. Everything the agent needs
to understand before executing the SQL.}
## SQL
{One or more fenced SQL blocks. If multiple variants or options exist,
use ### subheadings with prose explaining when to use each one.}
A single file can contain multiple SQL implementations. For example, check.md can contain both a full-scan and a sampled variant; fix.md can contain multiple remediation options with different tradeoffs. The agent reads the context to determine which SQL block to use.
Fix files may contain organizational process guidance (not just SQL) — for example, governance decisions, ownership assignments, or data model restructuring advice.
skills/ai-ready-data/
SKILL.md ← You are here
platforms/ ← Platform references
{PLATFORM}.md ← Capabilities, nuances, permissions, dialect
profiles/ ← Assessment profiles
scan.yaml ← Estate-level scan profile (lightweight)
rag.yaml ← RAG readiness profile
feature-serving.yaml ← Feature serving readiness profile
training.yaml ← Training readiness profile
agents.yaml ← Agents readiness profile
requirements/ ← Requirement manifest + implementation directories
requirements.yaml ← Single manifest (all requirement metadata)
{requirement_key}/
{platform}/
check.md ← Context + check SQL
diagnostic.md ← Context + diagnostic SQL
fix.md ← Context + remediation SQL/guidance
requirements/requirements.yaml with: description, factor, scope, implementations.requirements/{name}/{platform}/ directory with three markdown files:
check.md (required) — context + SQL returning a value score 0–1diagnostic.md (required) — context + SQL for detail drill-downfix.md (required) — remediation SQL and/or organizational guidanceprofiles/{name}.yaml with six stages (Clean, Contextual, Consumable, Current, Correlated, Compliant).extends to derive from an existing profile and apply overrides.platforms/{PLATFORM}.md covering capabilities, dialect, permissions, nuances, idempotency guards, and delegation targets.requirements/{key}/{platform}/.