Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

fabric-optimization-review

Name: Fabric Optimization Review
Author: drhelius

// Review and optimize data storage, data models, and semantic models in Microsoft Fabric lakehouses. Covers Delta table optimization (V-Order, Z-Order, compaction, partitioning), data model anti-pattern detection, semantic model DAX review, relationship optimization, AI/agentic readiness (descriptions, synonyms, linguistic schema, display folders), workspace audit (Fabric best practices, dimensional modeling, Direct Lake), and actionable recommendation reports. Use when: optimization, performance, review, tuning, best practices, V-Order, Z-Order, partitioning, compaction, DAX optimization, data model review, AI readiness, agentic, NLQ, Copilot, natural language, workspace audit, Direct Lake, dimensional modeling, audit.

Exécuter dans Manus

$ git log --oneline --stat

stars:2

forks:3

updated:13 mai 2026 à 10:27

Explorateur de fichiers

2 fichiers

SKILL.md

readonly

related-skills.json

même dépôt

fabric-metadata-creation.md

from "drhelius/gh-copilot-fabric-agents"

Create reviewable metadata proposals for Microsoft Fabric lakehouse tables and semantic models. Covers schema analysis, concise technical and business descriptions, glossary terms, classifications, Purview-like sensitivity label proposals, PII and sensitive data detection, relationship and lineage hints, data quality rules, CDE candidates, data products, JSON/PDF output, and steward review questions. Use when: metadata creation, data catalog, Purview mapping, glossary inference, data classification, sensitivity labels, stewardship review, governance metadata.

2026-05-062

fabric-synthetic-data.md

from "drhelius/gh-copilot-fabric-agents"

Generate and upload synthetic data to Microsoft Fabric lakehouse tables. Covers star-schema generation, realistic fake data (names, dates, IDs, transactions), configurable row counts, referential integrity between tables, Parquet export, and OneLake upload. Use when: synthetic data, test data, fake data, sample data, generate tables, populate lakehouse, seed data, mock data.

2026-04-202

fabric-data-cleaner.md

from "drhelius/gh-copilot-fabric-agents"

Algorithm reference and self-contained PySpark notebook templates for cleaning Microsoft Fabric lakehouse tables. Covers profiling, duplicate detection, null analysis, type validation, statistics, IQR outlier detection, date format validation, Spanish DNI/NIE checksum validation, and email/phone format checks. Each notebook can be uploaded and run independently in Fabric.

2026-04-072

fabric-semantic-model.md

from "drhelius/gh-copilot-fabric-agents"

Infer and generate Power BI semantic models (TMDL format) from Microsoft Fabric lakehouse tables. Covers star-schema detection, table classification (fact/dimension), relationship inference, DAX measure generation, data type mapping, and deployment via REST API. Use when: create semantic model, build data model, star schema, Power BI model, TMDL, fact table, dimension table.

2026-04-062

package.json

"author": "drhelius"

"repository": "drhelius/gh-copilot-fabric-agents"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Architectes de bases de donnéesProfessions informatiques et mathématiques15-1243L4

Exécutez n'importe quel Skill en un clic

name	fabric-optimization-review
description	Review and optimize data storage, data models, and semantic models in Microsoft Fabric lakehouses. Covers Delta table optimization (V-Order, Z-Order, compaction, partitioning), data model anti-pattern detection, semantic model DAX review, relationship optimization, AI/agentic readiness (descriptions, synonyms, linguistic schema, display folders), workspace audit (Fabric best practices, dimensional modeling, Direct Lake), and actionable recommendation reports. Use when: optimization, performance, review, tuning, best practices, V-Order, Z-Order, partitioning, compaction, DAX optimization, data model review, AI readiness, agentic, NLQ, Copilot, natural language, workspace audit, Direct Lake, dimensional modeling, audit.
compatibility	Requires the Fabric MCP Server VS Code extension for workspace and table discovery. Uses Fabric REST API for metadata retrieval. Python 3.10+.

Fabric Optimization Review — Reference

Analyzes Microsoft Fabric lakehouse tables, data models, and semantic models to detect performance issues, anti-patterns, and optimization opportunities. Produces actionable recommendations with severity, effort, and expected impact.

Review Categories

The agent performs three independent review tracks. The user can request one, several, or all:

Track	Target	What It Checks
Data Storage	Lakehouse Delta tables	File sizes, compaction, partitioning, V-Order, Z-Order, small files, schema evolution
Data Model	Lakehouse table relationships	Normalization, star-schema compliance, redundant columns, key integrity, naming conventions
Semantic Model	Power BI semantic models (TMDL)	DAX measure quality, relationship cardinality, hidden columns, implicit measures, format strings, role-playing dimensions
AI & Agentic Readiness	Semantic models + lakehouse tables	Descriptions, synonyms, linguistic schema, display folders, naming conventions, pre-computed measures, column visibility for AI/NLQ/Copilot
Workspace Audit	Full workspace	Fabric best practices, dimensional modeling compliance, Direct Lake readiness, item organization, capacity usage, pipeline health

Data Storage Optimization Rules

Small File Detection

Delta tables with many small files degrade read performance. Fabric uses V-Order by default but files still fragment from frequent small appends.

Metric	Threshold	Severity	Recommendation
Avg file size < 32 MB	< 32 MB	High	Run `OPTIMIZE` to compact files
File count > 1000 per partition	> 1000	High	Run `OPTIMIZE` with bin-packing
Avg file size < 8 MB	< 8 MB	Critical	Immediate compaction required; rethink ingestion batch size

Partitioning Assessment

Pattern	Severity	Recommendation
Table > 1 GB with no partitioning	Medium	Consider partitioning by date or high-cardinality filter column
Partition column with > 10,000 distinct values	High	Over-partitioned — creates too many small folders. Choose a coarser grain (year-month vs date)
Partition column with < 5 distinct values	Low	Under-partitioned — minimal benefit. Consider removing partitioning
Partition column not used in common filters	Medium	Partitioning on non-filter columns wastes metadata. Change to a commonly filtered column

V-Order and Z-Order

Check	Detection	Recommendation
V-Order not enabled	Table properties lack `delta.parquet.vorder.enabled = true`	Enable V-Order: `ALTER TABLE {table} SET TBLPROPERTIES ('delta.parquet.vorder.enabled' = 'true')`
Z-Order not configured on query-heavy columns	No `ZORDER BY` in OPTIMIZE history	Apply `OPTIMIZE {table} ZORDER BY ({columns})` for frequently filtered/joined columns
Z-Order on > 4 columns	OPTIMIZE with > 4 ZORDER columns	Z-Order effectiveness degrades beyond 3-4 columns. Prioritize the most selective ones

Table Maintenance

Check	Detection	Recommendation
No OPTIMIZE in last 7 days	Delta log has no OPTIMIZE commits recently	Schedule regular `OPTIMIZE` runs
Vacuum not run / retention > 7 days	`delta.deletedFileRetentionDuration` > 7 days	Run `VACUUM {table} RETAIN 168 HOURS` to reclaim storage
Stats collection stale	Column-level stats absent or outdated	Run `ANALYZE TABLE {table} COMPUTE STATISTICS FOR ALL COLUMNS`
Too many Delta log entries	> 1000 log files without checkpoint	Force checkpoint: table properties `delta.checkpointInterval`

Schema & Data Type Optimization

Pattern	Severity	Recommendation
String columns storing only integers	Medium	Cast to `INT`/`LONG` — smaller footprint, faster scans
String columns storing only dates	Medium	Cast to `DATE`/`TIMESTAMP` — enables predicate pushdown
`DOUBLE` for currency/money values	Low	Use `DECIMAL(18,2)` for exact arithmetic
Wide tables with > 50 columns	Low	Consider vertical partitioning (split into related tables)
Unused columns (null > 95%)	Medium	Consider dropping or archiving — reduces scan I/O

Data Model Review Rules

Star Schema Compliance

Anti-Pattern	Detection	Severity	Recommendation
Snowflake dimensions	Dimension table has FK to another dimension	Medium	Flatten into single dimension for better query performance
Flat/denormalized mega-table	Single table with > 30 columns mixing measures and attributes	High	Normalize into fact + dimension tables
Missing date dimension	Fact table has date columns but no dedicated date/calendar table	High	Create a date dimension for time intelligence
Fact-to-fact relationships	Two fact tables joined directly	High	Introduce shared dimensions (conformed dimensions)
Bridge tables without clear purpose	Many-to-many junction tables	Medium	Verify necessity; simplify if possible

Key & Relationship Integrity

Check	Detection	Recommendation
No defined primary key	Dimension table has no unique column identified	Define a PK to enable relationship validation
Composite keys	Joins require multiple columns	Consider surrogate integer keys for simpler joins
Orphan foreign keys	FK values not found in dimension PK	Data quality issue — clean orphans or add "Unknown" dimension member
Circular relationships	A → B → C → A	Redesign to eliminate cycles; causes ambiguous query paths
Bidirectional relationships	Cross-filter in both directions	Use single direction unless absolutely required (performance + ambiguity risk)

Naming Convention Review

Pattern	Severity	Recommendation
Inconsistent casing (mix of camelCase, snake_case, PascalCase)	Low	Standardize to one convention (prefer `snake_case` for Spark/lakehouse)
Table names without prefix (`fact_`, `dim_`)	Low	Add prefixes for clarity in large models
Ambiguous column names (`id`, `name`, `value`)	Medium	Qualify with table context (`customer_id`, `product_name`)
Reserved word column names (`date`, `order`, `table`)	Low	Rename to avoid query issues (`order_date`, `order_number`)

Semantic Model Review Rules

DAX Measures

Anti-Pattern	Detection	Severity	Recommendation
Implicit measures	`discourageImplicitMeasures` is false or missing	High	Set `discourageImplicitMeasures` in model.tmdl and create explicit DAX measures
SUM on non-additive columns	`SUM()` on ratio, percentage, or average columns	High	Use `AVERAGE()`, `DIVIDE()`, or weighted calculations
Duplicate measures	Two measures with identical DAX expressions	Medium	Consolidate to a single measure
Missing COUNTROWS	Fact table without a row-count measure	Low	Add `COUNTROWS('{table}')` as base metric
Unformatted measures	Measures without `formatString`	Medium	Add format strings (`#,##0`, `#,##0.00`, `0.0%`)
Time intelligence without DATEADD/SAMEPERIODLASTYEAR	Manual date calculations instead of DAX time functions	Medium	Use `DATEADD`, `SAMEPERIODLASTYEAR`, `TOTALYTD` for correct time intelligence
CALCULATE without explicit filter removal	Nested CALCULATE without REMOVEFILTERS	Low	Review filter context — may produce unexpected results

Relationship Configuration

Issue	Detection	Severity	Recommendation
Missing relationships	Matching key columns with no defined relationship	High	Add the relationship (many-to-one, single direction)
Wrong cardinality	Many-to-many where many-to-one is correct	High	Fix cardinality — many-to-many bypasses engine optimizations
Both-direction cross-filter	`crossFilteringBehavior: bothDirections` without a bridge table	Medium	Use single direction; both-direction causes ambiguity and performance issues
Role-playing dimensions	Same dimension used for multiple relationships (e.g., date table for order_date and ship_date)	Medium	Use `USERELATIONSHIP()` in measures for inactive relationships; document clearly
Inactive relationships not used	Inactive relationship exists but no measure uses `USERELATIONSHIP`	Low	Either activate or remove the unused relationship

Column Configuration

Issue	Detection	Severity	Recommendation
ID/key columns visible	`isHidden: false` on FK/PK columns	Medium	Hide technical columns from report view
`summarizeBy` not set to `none`	Non-measure columns with default aggregation	Medium	Set `summarizeBy: none` to prevent implicit measures
Missing `isKey: true`	Dimension PK not marked as key	Low	Mark PK columns as `isKey: true` for VertiPaq optimization
Data type mismatch	Relationship columns with different TMDL types	High	Align types — mismatches cause conversion overhead

Locale & Formatting

Issue	Detection	Severity	Recommendation
Missing culture	No `culture` in model.tmdl	Medium	Add `culture: es-ES` (or target locale)
Wrong format strings for locale	Using `.` as decimal in Spanish locale	Low	Use `#.##0,00` for es-ES (dot = thousands, comma = decimals)
Missing `sourceQueryCulture`	Not set in model.tmdl	Low	Add `sourceQueryCulture: es-ES` to match data source locale

AI & Agentic Readiness Rules

These rules evaluate how well a semantic model (and its underlying data) is prepared for consumption by AI agents, Copilot, natural language queries (NLQ), and other agentic interfaces. A model optimized for human report builders is NOT automatically optimized for AI — AI needs richer metadata, unambiguous naming, and pre-computed measures to generate correct answers.

1. Table Descriptions

Every table must have a description in TMDL that explains its purpose, grain (what one row represents), and key relationships.

Issue	Detection	Severity	Recommendation
Table without description	`description` property missing or empty in table TMDL	High	Add description explaining purpose, grain, and relationships. Example: `"Fact table of individual customer support calls. One row = one call. Joins to dim_agent, dim_customer, dim_date via FK columns."`
Description too short (< 20 chars)	Description exists but is trivially short	Medium	Expand with grain, relationships, and business context
Description is technical jargon	Description uses only internal/database terms	Medium	Rewrite in business language that an AI can interpret for end users

TMDL example:

table 'fact_calls'
	description: "Fact table recording individual customer support calls. One row represents a single call. Grain: call_id. Joins to dim_agent (agent handling the call), dim_customer (caller), dim_date (call date), dim_product (product discussed)."

2. Measure Descriptions

Every DAX measure must have a description explaining what it calculates, its units, when to use it, and any caveats.

Issue	Detection	Severity	Recommendation
Measure without description	`description` missing on measure	High	Add description with calculation logic, units, and usage guidance
Description missing units	Description doesn't mention currency, %, seconds, count, etc.	Medium	Specify units explicitly: "in euros (€)", "as percentage (0-100)", "in seconds"
Complex measure without usage guidance	CALCULATE/FILTER/time-intelligence measure with no context	Medium	Explain when to use vs. similar measures. Example: "Use for YTD comparison. For month-only, use [Revenue Monthly]."

TMDL example:

measure 'Average Handle Time' =
	AVERAGE('fact_calls'[duration_seconds])
	description: "Average duration of customer calls in seconds. Includes hold time. Use for operational efficiency analysis. For calls without hold time, use [Avg Talk Time]."
	formatString: #,##0.0

3. Column Descriptions

At minimum, describe all foreign key columns, amount/currency columns, coded/enumerated columns, and any column whose name is ambiguous.

Issue	Detection	Severity	Recommendation
FK column without description	Column ending in `_id`/`_key` with no description	High	Describe what it references: "Foreign key to dim_agent. Join on agent_id to get agent name and team."
Amount/currency column without description	Numeric column with `amount`, `total`, `price`, `importe`, `precio` in name, no description	Medium	Specify currency, tax inclusion, sign convention: "Net revenue in EUR, excluding VAT. Positive = income, negative = refund."
Coded/enum column without description	Column with low cardinality (< 20 distinct values) and no description	Medium	List possible values and meanings: "Call outcome code: 'R' = Resolved, 'E' = Escalated, 'A' = Abandoned, 'C' = Callback scheduled."
Date column without description	Date/timestamp column with no description	Low	Specify timezone, business meaning: "Date and time the call was answered by an agent (UTC). NULL if call was abandoned before pickup."

4. Synonyms & Annotations

Add synonyms so AI can map natural language terms (in any language) to the correct tables, columns, and measures. Critical for multilingual environments.

Issue	Detection	Severity	Recommendation
No synonyms on any table	No `annotation` with synonym content in TMDL	High	Add synonyms for each table in all relevant languages. Example: fact_calls → "llamadas", "calls", "phone calls", "tickets telefónicos"
Measures without synonyms	Measures have no alternative names	Medium	Add natural-language aliases: "Average Handle Time" → "AHT", "duración media", "tiempo medio de gestión"
Columns without synonyms	Key business columns lack aliases	Medium	Add synonyms for columns users ask about: "agent_name" → "nombre del agente", "rep", "representative"
Acronyms not expanded	Measures/columns use acronyms without synonyms	Medium	Map acronyms: "AHT" → "Average Handle Time", "FCR" → "First Call Resolution"

TMDL synonym example (via extended properties or annotations):

table 'fact_calls'
	annotation Synonyms = ["llamadas", "calls", "phone calls", "registros de llamadas"]

	column agent_name
		annotation Synonyms = ["agente", "representante", "rep", "agent"]

	measure 'Total Calls' = COUNTROWS('fact_calls')
		annotation Synonyms = ["número de llamadas", "call count", "total llamadas", "volumen"]

5. Display Folders

Group measures into logical displayFolder categories so AI agents can discover related measures and present coherent answers.

Issue	Detection	Severity	Recommendation
No displayFolder on any measure	All measures are at root level	Medium	Group into business domains: "Operativo", "Financiero", "Calidad", "Cliente", "Tiempo"
Inconsistent folder naming	Mix of languages, casing, or granularity	Low	Standardize folder names across all tables
Too many measures at root	> 10 measures without displayFolder	Medium	Organize into folders — AI uses folder context to disambiguate similar measures

Recommended folder structure:

Operativo/          → call volume, handle time, occupancy, utilization
Financiero/         → revenue, cost, margin, discounts
Calidad/            → FCR, CSAT, NPS, quality scores
Cliente/            → churn, lifetime value, satisfaction
Tiempo/             → YTD, MoM, QoQ, rolling averages
Objetivos/          → targets, KPI thresholds, variances

TMDL example:

measure 'Total Calls' = COUNTROWS('fact_calls')
	displayFolder: Operativo

measure 'Revenue' = SUM('fact_calls'[revenue])
	displayFolder: Financiero

measure 'CSAT Score' = AVERAGE('fact_surveys'[score])
	displayFolder: Calidad

6. Format Strings

Every measure and numeric column must have an appropriate formatString. AI agents use format strings to present results correctly and to infer the semantic type of a value.

Issue	Detection	Severity	Recommendation
Currency measure without currency symbol	Measure with `amount`/`revenue`/`price` but no `€`/`$` in formatString	Medium	Use `€#.##0,00` (es-ES) or `$#,##0.00` (en-US)
Percentage measure without %	Measure calculating ratios/rates but formatString lacks `%`	Medium	Use `0.0%` or `#,##0.0%`
Duration without unit format	Measure in seconds/minutes with plain numeric format	Medium	Use `#,##0.0` with description clarifying unit, or format as `HH:MM:SS` via DAX FORMAT()
Integer measure with decimals	Count/quantity measure showing decimal places	Low	Use `#,##0` for counts, `#.##0` for es-ES
Date columns without format	Date displayed in default format	Low	Set explicit format: `dd/MM/yyyy` for es-ES

7. Technical Column Visibility

Hide all columns that an AI agent or end user should never filter/slice by directly. AI should use descriptive columns (names, labels) not technical IDs.

Issue	Detection	Severity	Recommendation
FK ID columns visible	Columns ending in `_id`/`_key` with `isHidden: false`	High	Set `isHidden: true` — AI must filter by `agent_name`, not `agent_id`
Surrogate keys visible	Auto-increment or UUID columns exposed	High	Hide — these have no business meaning
ETL/audit columns visible	Columns like `created_at`, `updated_by`, `batch_id`, `row_hash`	Medium	Hide — irrelevant for analytical queries
Source system columns visible	Columns like `source_system`, `legacy_code`, `migration_flag`	Medium	Hide unless explicitly needed for analysis
Technical columns in dimension tables	Internal keys alongside descriptive columns	Medium	Only expose descriptive columns; hide all technical plumbing

TMDL example:

column agent_id
	dataType: int64
	isHidden: true
	isKey: false
	summarizeBy: none
	description: "FK to dim_agent. Hidden — use agent_name for filtering."

column agent_name
	dataType: string
	isHidden: false
	description: "Name of the support agent. Use this column to filter or group by agent."

8. Display Names & Naming Conventions

AI agents present column and measure names directly to users. Technical names like call_date or avg_ht_sec are confusing. Use friendly display names.

Issue	Detection	Severity	Recommendation
Column uses snake_case without display name	Column name contains underscores and no `displayName` override	Medium	Add friendly display name: `call_date` → "Fecha Llamada", `agent_name` → "Nombre Agente"
Measure uses abbreviations	Measure name has acronyms or short forms	Medium	Use full business names: `Avg HT` → "Tiempo Medio de Gestión"
Mixed language in names	Some names in English, others in Spanish	Low	Standardize to the model's primary language; use synonyms for the other
Table names not user-friendly	Tables exposed as `fact_calls` or `dbo_sales`	Low	Consider display names: "Llamadas" or "Ventas" for user-facing contexts

9. Pre-Computed Measures for AI

AI agents perform best with pre-computed measures they can use directly, rather than needing to compose complex DAX on the fly. Add common derived metrics.

Missing Pattern	Detection	Severity	Recommendation
No time-comparison measures	No YoY, MoM, QoQ, or YTD measures	High	Add: `[Revenue YTD]`, `[Revenue vs PY]`, `[Revenue MoM %]` using `TOTALYTD`, `SAMEPERIODLASTYEAR`, `DATEADD`
No ratio/efficiency measures	Only raw SUM/COUNT, no derived ratios	High	Add: `[Revenue per Call]`, `[Cost per Resolution]`, `[Calls per Agent]`
No percentage measures	Absolute values only, no share/mix	Medium	Add: `[% Resolved First Call]`, `[Agent Utilization %]`, `[Revenue Share %]`
No ranking measures	No RANKX-based measures	Medium	Add: `[Agent Rank by Revenue]`, `[Product Rank by Volume]` for top-N questions
No moving averages	No trend smoothing measures	Low	Add: `[Revenue 7-Day Avg]`, `[Calls 30-Day Moving Avg]` for trend analysis
No target/variance measures	No comparison to goals	Medium	Add: `[Revenue vs Target]`, `[Target Achievement %]` if target data exists
No cumulative measures	No running totals	Low	Add: `[Cumulative Revenue]`, `[Running Total Calls]` for progress tracking

Pre-computed measures help AI agents answer questions like "How are we doing vs last year?" or "Who is the top agent?" without composing DAX.

10. Linguistic Schema (Q&A / NLQ)

Configure the Power BI Q&A linguistic schema to enable natural language queries. This directly impacts Copilot and any NLQ interface.

Issue	Detection	Severity	Recommendation
No linguistic schema configured	Model has no Q&A linguistic schema	High	Create a linguistic schema YAML defining entities, attributes, relationships, and phrasings
Synonyms not in linguistic schema	Synonyms only in annotations, not in Q&A config	Medium	Mirror synonyms into the linguistic schema for Q&A engine consumption
No relationship phrasings	Relationships not described with natural language verbs	Medium	Add phrasings: "agent handles calls", "customer places calls", "call occurs on date"
No attribute phrasings	Columns not described as attributes of entities	Medium	Add: "call has a duration", "agent has a name", "customer has a segment"
Missing noun synonyms	Q&A doesn't understand domain terms	Medium	Add noun definitions: "AHT" = "average handle time", "FCR" = "first call resolution"
No verb-based phrasings for measures	Q&A can't interpret action-based queries	Medium	Add: "calculate revenue", "show average handle time", "count calls"

Linguistic schema YAML example:

Entities:
  - Name: Call
    Table: fact_calls
    Synonyms: [llamada, call, phone call, ticket]
    Attributes:
      - Name: Duration
        Column: duration_seconds
        Synonyms: [duración, handle time, call length]
      - Name: Date
        Column: call_date
        Synonyms: [fecha, when, call date]

Relationships:
  - Name: AgentHandlesCall
    Phrasing: "{agent} handles {call}"
    ForeignKey: fact_calls.agent_id
    PrimaryKey: dim_agent.agent_id

Phrasings:
  - Type: Attribute
    Entity: Call
    Attribute: Duration
    Phrasing: "{call} has a {duration}"
  - Type: Verb
    Subject: Agent
    Verb: handles
    Object: Call

11. Data Granularity & Aggregation Tables

AI queries often need aggregated answers. Pre-aggregated tables or aggregation-aware models reduce response time and improve accuracy.

Issue	Detection	Severity	Recommendation
Only detail-level fact table	Fact table has millions of rows, no pre-aggregations	Medium	Create aggregation tables (daily/weekly/monthly summaries) for common query patterns
No summary dimension attributes	Dimensions lack hierarchy levels (e.g., City → Region → Country)	Medium	Add rolled-up attributes for AI to answer at different granularities
Missing calendar hierarchies	Date dimension without Year → Quarter → Month → Week → Day	Medium	Add standard calendar hierarchies for time-based drill-down

12. Data Documentation for Context Grounding

Beyond TMDL metadata, provide additional context that AI agents can use for grounding and disambiguation.

Issue	Detection	Severity	Recommendation
No business glossary	No annotation or external reference defining business terms	Medium	Create a glossary annotation: key metrics definitions, business rules, calculation methodology
No data freshness indicator	No description of update frequency or latency	Low	Document in table description: "Updated daily at 06:00 UTC. Data latency: T-1."
No known limitations documented	No caveats about data coverage or quality	Low	Document in description: "Excludes internal test calls. Weekend data may have 2-hour delay."
No example questions	No annotation with sample natural-language queries	Low	Add annotation with 5-10 example questions the model can answer: "¿Cuántas llamadas atendió el agente X este mes?"

Example questions annotation:

table 'fact_calls'
	annotation ExampleQuestions = [
		"¿Cuántas llamadas hubo ayer?",
		"¿Cuál es el tiempo medio de gestión por agente?",
		"Top 5 agentes por volumen de llamadas este mes",
		"¿Cuál es la tasa de resolución en primera llamada?",
		"Comparar ingresos de este trimestre con el anterior",
		"¿Qué productos generan más llamadas de soporte?"
	]

AI Readiness Scoring

After evaluating all rules in this section, compute an AI Readiness Score (0-100%) for the model:

Component	Weight	Criteria
Table descriptions	15%	All tables have descriptions with grain and relationships
Measure descriptions	15%	All measures have descriptions with units and usage
Column descriptions	10%	All FKs, amounts, and coded columns described
Synonyms	15%	Tables, key columns, and measures have multilingual synonyms
Display folders	5%	Measures organized in logical folders
Format strings	5%	All numeric measures have appropriate format strings
Column visibility	10%	All technical columns hidden
Naming conventions	5%	Friendly display names, consistent language
Pre-computed measures	10%	Time comparisons, ratios, rankings available
Linguistic schema	5%	Q&A schema with phrasings and synonyms configured
Documentation	5%	Glossary, freshness, limitations, example questions

Score interpretation:

90-100%: Production-ready for AI/Copilot — model is fully grounded
70-89%: Usable with caveats — AI will sometimes misinterpret queries
50-69%: Significant gaps — AI will frequently generate wrong answers
< 50%: Not ready — AI usage will produce unreliable results

Workspace Audit Rules

A comprehensive audit of a Fabric workspace evaluating Fabric platform best practices, dimensional modeling quality, and Direct Lake readiness. This is a holistic review that combines findings from other tracks with Fabric-specific governance and architecture rules.

Fabric Best Practices

Workspace Organization

Issue	Detection	Severity	Recommendation
No naming convention for items	Items have inconsistent prefixes/suffixes (mix of `rpt_`, `Report_`, no prefix)	Medium	Adopt a standard: `LH_` (lakehouse), `SM_` (semantic model), `NB_` (notebook), `PL_` (pipeline), `RPT_` (report)
Dev/test items in production workspace	Items with `test`, `dev`, `tmp`, `copia`, `backup` in name	High	Move to a separate dev workspace; use deployment pipelines for promotion
No workspace description	Workspace has no description	Low	Add a description explaining purpose, team ownership, and data domains
Too many items (> 50)	Workspace contains excessive items	Medium	Split by domain or team; large workspaces are hard to govern
No deployment pipeline configured	No pipeline linking dev → test → prod workspaces	Medium	Configure Fabric deployment pipelines for safe promotion of artifacts
Mixed ownership	Items owned by individual users instead of service principal/group	Medium	Transfer ownership to a service principal or shared admin group

Capacity & Performance Governance

Issue	Detection	Severity	Recommendation
Workspace not assigned to a capacity	No capacity binding	Critical	Assign to a Fabric capacity — items won't run without it
Using shared capacity for production	Workspace on shared (non-dedicated) capacity	High	Move to dedicated capacity (F64+) for predictable performance and SLAs
No refresh schedule on semantic models	Semantic models with no automatic refresh	Medium	Configure scheduled refresh aligned with data pipeline completion
Lakehouse without scheduled maintenance	No OPTIMIZE/VACUUM automation	Medium	Create a notebook + pipeline that runs OPTIMIZE weekly and VACUUM monthly
Long-running notebook executions	Notebooks timing out or running > 30 minutes	Medium	Review Spark configuration; consider partitioning, caching, or pre-aggregation

Security & Governance

Issue	Detection	Severity	Recommendation
No workspace roles defined	Only admin + viewer, no contributor/member granularity	Medium	Define roles: Admin (platform team), Member (developers), Contributor (ETL), Viewer (consumers)
Sensitive data without RLS	Tables with PII/financial data and no Row-Level Security in semantic model	High	Implement RLS in the semantic model; use `USERNAME()` or `USERPRINCIPALNAME()` filters
No sensitivity labels	No Microsoft Purview sensitivity labels applied	Low	Apply sensitivity labels per data classification policy
Direct lakehouse access without controls	Users querying lakehouse SQL endpoint directly without governance	Medium	Use semantic models as the governed access layer; restrict direct SQL endpoint access

Dimensional Modeling Compliance

Star Schema Validation

Issue	Detection	Severity	Recommendation
No clear fact/dimension separation	Tables not identifiable as fact or dimension by name or structure	High	Restructure into explicit star schema: `fact_` prefix for transactional tables, `dim_` for reference
Fact table contains descriptive attributes	Fact table has string columns with names, descriptions, categories stored directly	High	Extract to dimension tables; keep only FK references + measures in facts
Dimension table with measure columns	Dimension has numeric additive columns (amounts, totals)	High	Move measures to fact table or create a factless fact for those metrics
No conformed dimensions	Same entity (customer, product, date) duplicated across multiple fact tables with different structures	High	Create conformed dimensions shared across all facts for consistent slicing
Missing surrogate keys	Dimensions use natural/business keys as PK	Medium	Add integer surrogate keys for performance and change management (SCD)
No Slowly Changing Dimension handling	Dimension records overwritten with no history	Medium	Implement SCD Type 2 (start_date/end_date/is_current) for critical dimensions
Calendar table missing fiscal periods	Date dimension only has calendar dates, no fiscal year/quarter	Medium	Add fiscal periods, holidays, working days for business reporting
Junk dimensions not consolidated	Multiple low-cardinality flag columns in fact table	Low	Consolidate into a single junk dimension to reduce fact table width

Grain & Granularity

Issue	Detection	Severity	Recommendation
Mixed grain in fact table	Some rows at daily level, others at monthly (or order header + order line mixed)	Critical	Separate into distinct fact tables at consistent grain
Fact table without clear grain	Cannot determine what one row represents	High	Document the grain explicitly; if ambiguous, restructure
Pre-aggregated fact without detail	Only summary-level fact exists with no detail backup	Medium	Keep detail-level fact for drill-through; add aggregation tables separately

Relationship Patterns

Issue	Detection	Severity	Recommendation
Many-to-many without bridge table	Direct M:N relationship between two tables	High	Create a bridge/junction table to properly model the M:N relationship
Multiple active relationships to same table	Two active relationships from fact to same dimension	High	Make one inactive; use `USERELATIONSHIP()` in measures for the secondary path
Self-referencing dimension without parent-child	Hierarchy column (manager_id → employee_id) not modeled	Medium	Implement parent-child hierarchy with PATH/PATHITEM DAX functions
Circular dependency in relationships	A → B → C → A path exists	Critical	Redesign to eliminate cycles; consider role-playing or bridge patterns

Direct Lake Readiness

Direct Lake is Fabric's highest-performance mode for semantic models — it reads directly from Delta/Parquet files in OneLake without data movement. However, it has strict requirements.

Prerequisites & Compatibility

Issue	Detection	Severity	Recommendation
Semantic model uses Import mode	Model partitions show `mode: import` or explicit M queries with transformations	High	Convert to Direct Lake: remove transformations, ensure lakehouse tables are query-ready
Semantic model uses DirectQuery to SQL endpoint	Partitions use `Sql.Database()` with `mode: directQuery`	Medium	Consider Direct Lake for better performance if data fits within guardrails
Tables not in Delta format	Lakehouse contains CSV/Parquet files not loaded as Delta tables	Critical	Convert to Delta tables: `spark.read.parquet(...).write.format("delta").saveAsTable(...)`
Data transformations in partition expressions	M/Power Query in partition does joins, filters, or calculations	High	Move all transformations upstream to the lakehouse (notebooks/pipelines); Direct Lake requires clean passthrough
Lakehouse and semantic model in different workspaces	Model references tables in a different workspace's lakehouse	Medium	Place in same workspace or use shortcuts; cross-workspace Direct Lake has limitations

Direct Lake Guardrails

Direct Lake has capacity-dependent limits. If exceeded, the model falls back to DirectQuery (performance cliff).

Guardrail	F2	F64	F128+	What Happens
Max tables per model	500	500	500	Model creation fails
Max columns per table	500	500	500	Table skipped
Max rows per table	300M	1.5B	3B+	Fallback to DirectQuery
Max model size on disk	10 GB	40 GB	80 GB+	Fallback to DirectQuery
Max columns per model	2,000	2,000	2,000	Model creation fails

Issue	Detection	Severity	Recommendation
Table exceeds row limit for capacity	Row count > capacity guardrail	Critical	Partition the table, reduce historical depth, or upgrade capacity SKU
Model size exceeds capacity guardrail	Total Delta file size > model size limit	High	Reduce column count, compress strings, or split into multiple models
Too many columns in a table	Table has > 300 columns	High	Vertical partitioning: split into related tables
Unneeded columns inflating model size	Wide tables with columns never used in reports	Medium	Remove unused columns from lakehouse tables or create views

Data Preparation for Direct Lake

Issue	Detection	Severity	Recommendation
String columns with high cardinality	String column with > 1M distinct values	High	Consider hashing, bucketing, or moving to a detail table not in the model
Columns with mixed types	Column storing numbers as strings intermittently	High	Enforce strict typing in the lakehouse; Direct Lake cannot handle type mismatches gracefully
Decimal columns with excessive precision	`DECIMAL(38,18)` when `DECIMAL(18,2)` suffices	Medium	Reduce precision — smaller Parquet footprint = faster Direct Lake
Large text columns (> 1KB avg)	Description/notes columns with long text in Direct Lake model	Medium	Exclude from model or move to a detail table only accessed via drill-through
Files not V-Ordered	V-Order disabled on Direct Lake source tables	High	Enable V-Order — Direct Lake reads V-Ordered Parquet significantly faster
Too many small files	Avg file < 32 MB in Direct Lake source table	High	Run `OPTIMIZE` — Direct Lake opens each file individually; many small files = slow
Data not compacted after incremental load	New data appended as tiny files after each pipeline run	High	Add `OPTIMIZE` step at end of each pipeline/notebook that modifies the table

Direct Lake Semantic Model Configuration

Issue	Detection	Severity	Recommendation
Fallback to DirectQuery not monitored	No alert on fallback events	Medium	Monitor via Fabric capacity metrics; set alerts for DirectQuery fallback
Auto-refresh not configured	Direct Lake model not set to detect Delta changes	Medium	Enable `autoSync` so model picks up new data without manual refresh
Framing not triggered after data load	Pipeline loads data but doesn't frame the model	High	Add model framing call (`POST /refresh`) at end of pipeline to update Direct Lake metadata
Calculated columns in Direct Lake	Model uses calculated columns (DAX-computed at refresh)	Medium	Move calculation logic to the lakehouse (PySpark/SQL) — calculated columns force import-mode fallback for those columns
Calculated tables in model	`CALENDAR()`, `GENERATESERIES()`, or other calculated tables	Medium	Materialize as lakehouse tables; calculated tables aren't Direct Lake compatible
RLS with complex DAX filters	Row-Level Security using `LOOKUPVALUE` or other complex patterns	Medium	Simplify RLS to direct column comparisons; complex RLS can trigger fallback

Direct Lake Migration Checklist

When recommending migration from Import/DirectQuery to Direct Lake, provide this ordered checklist:

Verify capacity SKU — Direct Lake requires F64+ for production workloads
Audit table sizes — ensure all tables fit within guardrails for the capacity tier
Move transformations upstream — all data prep must happen in the lakehouse, not in Power Query
Optimize Delta tables — run OPTIMIZE + VACUUM on all source tables; enable V-Order
Remove calculated tables — materialize as lakehouse tables
Remove calculated columns — push logic to lakehouse or convert to measures
Simplify RLS — ensure filters are simple column comparisons
Create the Direct Lake model — use TMDL with mode: directLake partitions pointing to lakehouse tables
Configure auto-sync — enable automatic metadata sync with Delta log
Add framing to pipelines — trigger model refresh after each data load
Monitor fallback — set up capacity metric alerts for DirectQuery fallback events
Validate performance — compare query times vs previous mode; verify no fallback

Workspace Audit Scoring

After evaluating all workspace audit rules, compute a Workspace Health Score (0-100%):

Component	Weight	Criteria
Workspace organization	10%	Consistent naming, no dev artifacts in prod, description set
Capacity & performance	15%	Dedicated capacity, refresh schedules, maintenance automation
Security & governance	15%	Roles defined, RLS where needed, sensitivity labels
Star schema compliance	20%	Clear fact/dim separation, conformed dimensions, correct grain
Dimensional modeling quality	15%	Surrogate keys, SCD handling, hierarchies, no mixed grain
Direct Lake readiness	15%	Tables in Delta, V-Ordered, within guardrails, no transformations in model
Direct Lake configuration	10%	Auto-sync, framing, no calculated tables/columns, monitored fallback

Score interpretation:

90-100%: Exemplary workspace — production-ready, well-governed, Direct Lake optimized
70-89%: Good foundation — minor improvements needed for full optimization
50-69%: Significant issues — performance and governance risks present
< 50%: Major redesign needed — critical anti-patterns detected

Recommendation Report Format

Each recommendation must include:

┌─────────────────────────────────────────────────────┐
│ [SEVERITY] Category — Short title                   │
├─────────────────────────────────────────────────────┤
│ Target:    {table/model/column}                     │
│ Finding:   What was detected                        │
│ Impact:    Why it matters (performance, correctness) │
│ Action:    Specific steps to fix                    │
│ Effort:    Low / Medium / High                      │
│ Priority:  1-5 (1 = fix immediately)                │
│ Code:      (if applicable) Script/command to run    │
└─────────────────────────────────────────────────────┘

Severity levels:

Critical: Data correctness at risk or severe performance degradation
High: Significant performance impact or anti-pattern
Medium: Moderate improvement opportunity
Low: Best-practice suggestion, cosmetic or minor

Action Plan Generation

When generating an action plan, group recommendations by:

Quick Wins (Low effort, High/Critical severity) — do these first
Important Fixes (Medium effort, High/Critical severity) — schedule soon
Improvements (Any effort, Medium severity) — plan for next sprint
Nice-to-Haves (Low severity) — backlog

For each group, provide:

Ordered list of actions with dependencies (e.g., "Run OPTIMIZE before VACUUM")
Estimated impact summary
PySpark/SQL/DAX code snippets ready to execute
Rollback steps where applicable

Deployment of Recommendations

When the user chooses to deploy (apply) recommendations:

Data Storage optimizations

Generate a PySpark notebook with cells for each optimization:

OPTIMIZE commands with ZORDER
VACUUM commands
ALTER TABLE for property changes
ANALYZE TABLE for statistics
Data type casts via Delta table rewrites

Data Model changes

Generate a PySpark notebook with:

Table creation DDL for new dimension/fact tables
INSERT INTO SELECT for data migration
ALTER TABLE for column renames
DROP TABLE for consolidated tables (after user confirmation)

Semantic Model fixes

Generate updated TMDL files with fixes applied:

Updated measures with correct DAX
Fixed relationship definitions
Corrected column configurations
Deploy using fabric_semantic_model.py deploy

Script Reference

Use ./scripts/fabric_optimization_review.py for Fabric REST API operations during the review.

Install dependencies (one-time):

pip install azure-identity requests

Get table properties and Delta log metadata

python scripts/fabric_optimization_review.py table-info <workspace_id> <lakehouse_id> <table_name>

Get semantic model definition (TMDL)

python scripts/fabric_optimization_review.py get-model <workspace_id> <model_id>

List all items in a workspace

python scripts/fabric_optimization_review.py list-items <workspace_id> [--type lakehouse|semanticmodel]

Gotchas

V-Order is enabled by default in Fabric but can be disabled at the table level. Always check delta.parquet.vorder.enabled in table properties rather than assuming.
OPTIMIZE is safe but resource-intensive. Running on very large tables during business hours can impact concurrent queries. Recommend off-peak scheduling.
VACUUM is destructive. Once files are vacuumed, time travel to earlier versions is lost. Always warn the user and confirm retention period.
Z-Order is not partitioning. Z-Order co-locates related data within files but does not create folder structure. Both can be used together.
Direct Lake models cannot be deployed via the standard REST API POST semanticModels with TMDL. Use the Fabric Items API with definition format or create via the Power BI Desktop / Fabric portal. For the workspace audit, recommend Direct Lake readiness but clarify the deployment path.
Direct Lake fallback is silent. When a model exceeds guardrails, it silently falls back to DirectQuery mode. The only way to detect fallback is via capacity metrics or DAX INFO.STORAGEMODEUSED(). Always recommend monitoring.
Column-level statistics are not computed automatically in Fabric Delta tables. Stale or missing statistics cause the optimizer to make bad decisions (full scans instead of file skipping).
Bidirectional cross-filtering is the most common semantic model performance killer. Always flag it unless there's a documented bridge-table pattern.
Format strings are locale-dependent. #,##0.00 is US format. For es-ES, use #.##0,00 (dot for thousands, comma for decimals).

name	fabric-optimization-review
description	Review and optimize data storage, data models, and semantic models in Microsoft Fabric lakehouses. Covers Delta table optimization (V-Order, Z-Order, compaction, partitioning), data model anti-pattern detection, semantic model DAX review, relationship optimization, AI/agentic readiness (descriptions, synonyms, linguistic schema, display folders), workspace audit (Fabric best practices, dimensional modeling, Direct Lake), and actionable recommendation reports. Use when: optimization, performance, review, tuning, best practices, V-Order, Z-Order, partitioning, compaction, DAX optimization, data model review, AI readiness, agentic, NLQ, Copilot, natural language, workspace audit, Direct Lake, dimensional modeling, audit.
compatibility	Requires the Fabric MCP Server VS Code extension for workspace and table discovery. Uses Fabric REST API for metadata retrieval. Python 3.10+.

Fabric Optimization Review — Reference

Review Categories

The agent performs three independent review tracks. The user can request one, several, or all:

Track	Target	What It Checks
Data Storage	Lakehouse Delta tables	File sizes, compaction, partitioning, V-Order, Z-Order, small files, schema evolution
Data Model	Lakehouse table relationships	Normalization, star-schema compliance, redundant columns, key integrity, naming conventions
Semantic Model	Power BI semantic models (TMDL)	DAX measure quality, relationship cardinality, hidden columns, implicit measures, format strings, role-playing dimensions
AI & Agentic Readiness	Semantic models + lakehouse tables	Descriptions, synonyms, linguistic schema, display folders, naming conventions, pre-computed measures, column visibility for AI/NLQ/Copilot
Workspace Audit	Full workspace	Fabric best practices, dimensional modeling compliance, Direct Lake readiness, item organization, capacity usage, pipeline health

Data Storage Optimization Rules

Small File Detection

Delta tables with many small files degrade read performance. Fabric uses V-Order by default but files still fragment from frequent small appends.

Metric	Threshold	Severity	Recommendation
Avg file size < 32 MB	< 32 MB	High	Run `OPTIMIZE` to compact files
File count > 1000 per partition	> 1000	High	Run `OPTIMIZE` with bin-packing
Avg file size < 8 MB	< 8 MB	Critical	Immediate compaction required; rethink ingestion batch size

Partitioning Assessment

Pattern	Severity	Recommendation
Table > 1 GB with no partitioning	Medium	Consider partitioning by date or high-cardinality filter column
Partition column with > 10,000 distinct values	High	Over-partitioned — creates too many small folders. Choose a coarser grain (year-month vs date)
Partition column with < 5 distinct values	Low	Under-partitioned — minimal benefit. Consider removing partitioning
Partition column not used in common filters	Medium	Partitioning on non-filter columns wastes metadata. Change to a commonly filtered column

V-Order and Z-Order

Check	Detection	Recommendation
V-Order not enabled	Table properties lack `delta.parquet.vorder.enabled = true`	Enable V-Order: `ALTER TABLE {table} SET TBLPROPERTIES ('delta.parquet.vorder.enabled' = 'true')`
Z-Order not configured on query-heavy columns	No `ZORDER BY` in OPTIMIZE history	Apply `OPTIMIZE {table} ZORDER BY ({columns})` for frequently filtered/joined columns
Z-Order on > 4 columns	OPTIMIZE with > 4 ZORDER columns	Z-Order effectiveness degrades beyond 3-4 columns. Prioritize the most selective ones

Table Maintenance

Check	Detection	Recommendation
No OPTIMIZE in last 7 days	Delta log has no OPTIMIZE commits recently	Schedule regular `OPTIMIZE` runs
Vacuum not run / retention > 7 days	`delta.deletedFileRetentionDuration` > 7 days	Run `VACUUM {table} RETAIN 168 HOURS` to reclaim storage
Stats collection stale	Column-level stats absent or outdated	Run `ANALYZE TABLE {table} COMPUTE STATISTICS FOR ALL COLUMNS`
Too many Delta log entries	> 1000 log files without checkpoint	Force checkpoint: table properties `delta.checkpointInterval`

Schema & Data Type Optimization

Pattern	Severity	Recommendation
String columns storing only integers	Medium	Cast to `INT`/`LONG` — smaller footprint, faster scans
String columns storing only dates	Medium	Cast to `DATE`/`TIMESTAMP` — enables predicate pushdown
`DOUBLE` for currency/money values	Low	Use `DECIMAL(18,2)` for exact arithmetic
Wide tables with > 50 columns	Low	Consider vertical partitioning (split into related tables)
Unused columns (null > 95%)	Medium	Consider dropping or archiving — reduces scan I/O

Data Model Review Rules

Star Schema Compliance

Anti-Pattern	Detection	Severity	Recommendation
Snowflake dimensions	Dimension table has FK to another dimension	Medium	Flatten into single dimension for better query performance
Flat/denormalized mega-table	Single table with > 30 columns mixing measures and attributes	High	Normalize into fact + dimension tables
Missing date dimension	Fact table has date columns but no dedicated date/calendar table	High	Create a date dimension for time intelligence
Fact-to-fact relationships	Two fact tables joined directly	High	Introduce shared dimensions (conformed dimensions)
Bridge tables without clear purpose	Many-to-many junction tables	Medium	Verify necessity; simplify if possible

Key & Relationship Integrity

Check	Detection	Recommendation
No defined primary key	Dimension table has no unique column identified	Define a PK to enable relationship validation
Composite keys	Joins require multiple columns	Consider surrogate integer keys for simpler joins
Orphan foreign keys	FK values not found in dimension PK	Data quality issue — clean orphans or add "Unknown" dimension member
Circular relationships	A → B → C → A	Redesign to eliminate cycles; causes ambiguous query paths
Bidirectional relationships	Cross-filter in both directions	Use single direction unless absolutely required (performance + ambiguity risk)

Naming Convention Review

Pattern	Severity	Recommendation
Inconsistent casing (mix of camelCase, snake_case, PascalCase)	Low	Standardize to one convention (prefer `snake_case` for Spark/lakehouse)
Table names without prefix (`fact_`, `dim_`)	Low	Add prefixes for clarity in large models
Ambiguous column names (`id`, `name`, `value`)	Medium	Qualify with table context (`customer_id`, `product_name`)
Reserved word column names (`date`, `order`, `table`)	Low	Rename to avoid query issues (`order_date`, `order_number`)

Semantic Model Review Rules

DAX Measures

Anti-Pattern	Detection	Severity	Recommendation
Implicit measures	`discourageImplicitMeasures` is false or missing	High	Set `discourageImplicitMeasures` in model.tmdl and create explicit DAX measures
SUM on non-additive columns	`SUM()` on ratio, percentage, or average columns	High	Use `AVERAGE()`, `DIVIDE()`, or weighted calculations
Duplicate measures	Two measures with identical DAX expressions	Medium	Consolidate to a single measure
Missing COUNTROWS	Fact table without a row-count measure	Low	Add `COUNTROWS('{table}')` as base metric
Unformatted measures	Measures without `formatString`	Medium	Add format strings (`#,##0`, `#,##0.00`, `0.0%`)
Time intelligence without DATEADD/SAMEPERIODLASTYEAR	Manual date calculations instead of DAX time functions	Medium	Use `DATEADD`, `SAMEPERIODLASTYEAR`, `TOTALYTD` for correct time intelligence
CALCULATE without explicit filter removal	Nested CALCULATE without REMOVEFILTERS	Low	Review filter context — may produce unexpected results

Relationship Configuration

Issue	Detection	Severity	Recommendation
Missing relationships	Matching key columns with no defined relationship	High	Add the relationship (many-to-one, single direction)
Wrong cardinality	Many-to-many where many-to-one is correct	High	Fix cardinality — many-to-many bypasses engine optimizations
Both-direction cross-filter	`crossFilteringBehavior: bothDirections` without a bridge table	Medium	Use single direction; both-direction causes ambiguity and performance issues
Role-playing dimensions	Same dimension used for multiple relationships (e.g., date table for order_date and ship_date)	Medium	Use `USERELATIONSHIP()` in measures for inactive relationships; document clearly
Inactive relationships not used	Inactive relationship exists but no measure uses `USERELATIONSHIP`	Low	Either activate or remove the unused relationship

Column Configuration

Issue	Detection	Severity	Recommendation
ID/key columns visible	`isHidden: false` on FK/PK columns	Medium	Hide technical columns from report view
`summarizeBy` not set to `none`	Non-measure columns with default aggregation	Medium	Set `summarizeBy: none` to prevent implicit measures
Missing `isKey: true`	Dimension PK not marked as key	Low	Mark PK columns as `isKey: true` for VertiPaq optimization
Data type mismatch	Relationship columns with different TMDL types	High	Align types — mismatches cause conversion overhead

Locale & Formatting

Issue	Detection	Severity	Recommendation
Missing culture	No `culture` in model.tmdl	Medium	Add `culture: es-ES` (or target locale)
Wrong format strings for locale	Using `.` as decimal in Spanish locale	Low	Use `#.##0,00` for es-ES (dot = thousands, comma = decimals)
Missing `sourceQueryCulture`	Not set in model.tmdl	Low	Add `sourceQueryCulture: es-ES` to match data source locale

AI & Agentic Readiness Rules

1. Table Descriptions

Every table must have a description in TMDL that explains its purpose, grain (what one row represents), and key relationships.

Issue	Detection	Severity	Recommendation
Table without description	`description` property missing or empty in table TMDL	High	Add description explaining purpose, grain, and relationships. Example: `"Fact table of individual customer support calls. One row = one call. Joins to dim_agent, dim_customer, dim_date via FK columns."`
Description too short (< 20 chars)	Description exists but is trivially short	Medium	Expand with grain, relationships, and business context
Description is technical jargon	Description uses only internal/database terms	Medium	Rewrite in business language that an AI can interpret for end users

TMDL example:

table 'fact_calls'
	description: "Fact table recording individual customer support calls. One row represents a single call. Grain: call_id. Joins to dim_agent (agent handling the call), dim_customer (caller), dim_date (call date), dim_product (product discussed)."

2. Measure Descriptions

Every DAX measure must have a description explaining what it calculates, its units, when to use it, and any caveats.

Issue	Detection	Severity	Recommendation
Measure without description	`description` missing on measure	High	Add description with calculation logic, units, and usage guidance
Description missing units	Description doesn't mention currency, %, seconds, count, etc.	Medium	Specify units explicitly: "in euros (€)", "as percentage (0-100)", "in seconds"
Complex measure without usage guidance	CALCULATE/FILTER/time-intelligence measure with no context	Medium	Explain when to use vs. similar measures. Example: "Use for YTD comparison. For month-only, use [Revenue Monthly]."

TMDL example:

measure 'Average Handle Time' =
	AVERAGE('fact_calls'[duration_seconds])
	description: "Average duration of customer calls in seconds. Includes hold time. Use for operational efficiency analysis. For calls without hold time, use [Avg Talk Time]."
	formatString: #,##0.0

3. Column Descriptions

At minimum, describe all foreign key columns, amount/currency columns, coded/enumerated columns, and any column whose name is ambiguous.

Issue	Detection	Severity	Recommendation
FK column without description	Column ending in `_id`/`_key` with no description	High	Describe what it references: "Foreign key to dim_agent. Join on agent_id to get agent name and team."
Amount/currency column without description	Numeric column with `amount`, `total`, `price`, `importe`, `precio` in name, no description	Medium	Specify currency, tax inclusion, sign convention: "Net revenue in EUR, excluding VAT. Positive = income, negative = refund."
Coded/enum column without description	Column with low cardinality (< 20 distinct values) and no description	Medium	List possible values and meanings: "Call outcome code: 'R' = Resolved, 'E' = Escalated, 'A' = Abandoned, 'C' = Callback scheduled."
Date column without description	Date/timestamp column with no description	Low	Specify timezone, business meaning: "Date and time the call was answered by an agent (UTC). NULL if call was abandoned before pickup."

4. Synonyms & Annotations

Add synonyms so AI can map natural language terms (in any language) to the correct tables, columns, and measures. Critical for multilingual environments.

Issue	Detection	Severity	Recommendation
No synonyms on any table	No `annotation` with synonym content in TMDL	High	Add synonyms for each table in all relevant languages. Example: fact_calls → "llamadas", "calls", "phone calls", "tickets telefónicos"
Measures without synonyms	Measures have no alternative names	Medium	Add natural-language aliases: "Average Handle Time" → "AHT", "duración media", "tiempo medio de gestión"
Columns without synonyms	Key business columns lack aliases	Medium	Add synonyms for columns users ask about: "agent_name" → "nombre del agente", "rep", "representative"
Acronyms not expanded	Measures/columns use acronyms without synonyms	Medium	Map acronyms: "AHT" → "Average Handle Time", "FCR" → "First Call Resolution"

TMDL synonym example (via extended properties or annotations):

table 'fact_calls'
	annotation Synonyms = ["llamadas", "calls", "phone calls", "registros de llamadas"]

	column agent_name
		annotation Synonyms = ["agente", "representante", "rep", "agent"]

	measure 'Total Calls' = COUNTROWS('fact_calls')
		annotation Synonyms = ["número de llamadas", "call count", "total llamadas", "volumen"]

5. Display Folders

Group measures into logical displayFolder categories so AI agents can discover related measures and present coherent answers.

Issue	Detection	Severity	Recommendation
No displayFolder on any measure	All measures are at root level	Medium	Group into business domains: "Operativo", "Financiero", "Calidad", "Cliente", "Tiempo"
Inconsistent folder naming	Mix of languages, casing, or granularity	Low	Standardize folder names across all tables
Too many measures at root	> 10 measures without displayFolder	Medium	Organize into folders — AI uses folder context to disambiguate similar measures

Recommended folder structure:

Operativo/          → call volume, handle time, occupancy, utilization
Financiero/         → revenue, cost, margin, discounts
Calidad/            → FCR, CSAT, NPS, quality scores
Cliente/            → churn, lifetime value, satisfaction
Tiempo/             → YTD, MoM, QoQ, rolling averages
Objetivos/          → targets, KPI thresholds, variances

TMDL example:

measure 'Total Calls' = COUNTROWS('fact_calls')
	displayFolder: Operativo

measure 'Revenue' = SUM('fact_calls'[revenue])
	displayFolder: Financiero

measure 'CSAT Score' = AVERAGE('fact_surveys'[score])
	displayFolder: Calidad

6. Format Strings

Every measure and numeric column must have an appropriate formatString. AI agents use format strings to present results correctly and to infer the semantic type of a value.

Issue	Detection	Severity	Recommendation
Currency measure without currency symbol	Measure with `amount`/`revenue`/`price` but no `€`/`$` in formatString	Medium	Use `€#.##0,00` (es-ES) or `$#,##0.00` (en-US)
Percentage measure without %	Measure calculating ratios/rates but formatString lacks `%`	Medium	Use `0.0%` or `#,##0.0%`
Duration without unit format	Measure in seconds/minutes with plain numeric format	Medium	Use `#,##0.0` with description clarifying unit, or format as `HH:MM:SS` via DAX FORMAT()
Integer measure with decimals	Count/quantity measure showing decimal places	Low	Use `#,##0` for counts, `#.##0` for es-ES
Date columns without format	Date displayed in default format	Low	Set explicit format: `dd/MM/yyyy` for es-ES

7. Technical Column Visibility

Hide all columns that an AI agent or end user should never filter/slice by directly. AI should use descriptive columns (names, labels) not technical IDs.

Issue	Detection	Severity	Recommendation
FK ID columns visible	Columns ending in `_id`/`_key` with `isHidden: false`	High	Set `isHidden: true` — AI must filter by `agent_name`, not `agent_id`
Surrogate keys visible	Auto-increment or UUID columns exposed	High	Hide — these have no business meaning
ETL/audit columns visible	Columns like `created_at`, `updated_by`, `batch_id`, `row_hash`	Medium	Hide — irrelevant for analytical queries
Source system columns visible	Columns like `source_system`, `legacy_code`, `migration_flag`	Medium	Hide unless explicitly needed for analysis
Technical columns in dimension tables	Internal keys alongside descriptive columns	Medium	Only expose descriptive columns; hide all technical plumbing

TMDL example:

column agent_id
	dataType: int64
	isHidden: true
	isKey: false
	summarizeBy: none
	description: "FK to dim_agent. Hidden — use agent_name for filtering."

column agent_name
	dataType: string
	isHidden: false
	description: "Name of the support agent. Use this column to filter or group by agent."

8. Display Names & Naming Conventions

AI agents present column and measure names directly to users. Technical names like call_date or avg_ht_sec are confusing. Use friendly display names.

Issue	Detection	Severity	Recommendation
Column uses snake_case without display name	Column name contains underscores and no `displayName` override	Medium	Add friendly display name: `call_date` → "Fecha Llamada", `agent_name` → "Nombre Agente"
Measure uses abbreviations	Measure name has acronyms or short forms	Medium	Use full business names: `Avg HT` → "Tiempo Medio de Gestión"
Mixed language in names	Some names in English, others in Spanish	Low	Standardize to the model's primary language; use synonyms for the other
Table names not user-friendly	Tables exposed as `fact_calls` or `dbo_sales`	Low	Consider display names: "Llamadas" or "Ventas" for user-facing contexts

9. Pre-Computed Measures for AI

AI agents perform best with pre-computed measures they can use directly, rather than needing to compose complex DAX on the fly. Add common derived metrics.

Missing Pattern	Detection	Severity	Recommendation
No time-comparison measures	No YoY, MoM, QoQ, or YTD measures	High	Add: `[Revenue YTD]`, `[Revenue vs PY]`, `[Revenue MoM %]` using `TOTALYTD`, `SAMEPERIODLASTYEAR`, `DATEADD`
No ratio/efficiency measures	Only raw SUM/COUNT, no derived ratios	High	Add: `[Revenue per Call]`, `[Cost per Resolution]`, `[Calls per Agent]`
No percentage measures	Absolute values only, no share/mix	Medium	Add: `[% Resolved First Call]`, `[Agent Utilization %]`, `[Revenue Share %]`
No ranking measures	No RANKX-based measures	Medium	Add: `[Agent Rank by Revenue]`, `[Product Rank by Volume]` for top-N questions
No moving averages	No trend smoothing measures	Low	Add: `[Revenue 7-Day Avg]`, `[Calls 30-Day Moving Avg]` for trend analysis
No target/variance measures	No comparison to goals	Medium	Add: `[Revenue vs Target]`, `[Target Achievement %]` if target data exists
No cumulative measures	No running totals	Low	Add: `[Cumulative Revenue]`, `[Running Total Calls]` for progress tracking

Pre-computed measures help AI agents answer questions like "How are we doing vs last year?" or "Who is the top agent?" without composing DAX.

10. Linguistic Schema (Q&A / NLQ)

Configure the Power BI Q&A linguistic schema to enable natural language queries. This directly impacts Copilot and any NLQ interface.

Issue	Detection	Severity	Recommendation
No linguistic schema configured	Model has no Q&A linguistic schema	High	Create a linguistic schema YAML defining entities, attributes, relationships, and phrasings
Synonyms not in linguistic schema	Synonyms only in annotations, not in Q&A config	Medium	Mirror synonyms into the linguistic schema for Q&A engine consumption
No relationship phrasings	Relationships not described with natural language verbs	Medium	Add phrasings: "agent handles calls", "customer places calls", "call occurs on date"
No attribute phrasings	Columns not described as attributes of entities	Medium	Add: "call has a duration", "agent has a name", "customer has a segment"
Missing noun synonyms	Q&A doesn't understand domain terms	Medium	Add noun definitions: "AHT" = "average handle time", "FCR" = "first call resolution"
No verb-based phrasings for measures	Q&A can't interpret action-based queries	Medium	Add: "calculate revenue", "show average handle time", "count calls"

Linguistic schema YAML example:

Entities:
  - Name: Call
    Table: fact_calls
    Synonyms: [llamada, call, phone call, ticket]
    Attributes:
      - Name: Duration
        Column: duration_seconds
        Synonyms: [duración, handle time, call length]
      - Name: Date
        Column: call_date
        Synonyms: [fecha, when, call date]

Relationships:
  - Name: AgentHandlesCall
    Phrasing: "{agent} handles {call}"
    ForeignKey: fact_calls.agent_id
    PrimaryKey: dim_agent.agent_id

Phrasings:
  - Type: Attribute
    Entity: Call
    Attribute: Duration
    Phrasing: "{call} has a {duration}"
  - Type: Verb
    Subject: Agent
    Verb: handles
    Object: Call

11. Data Granularity & Aggregation Tables

AI queries often need aggregated answers. Pre-aggregated tables or aggregation-aware models reduce response time and improve accuracy.

Issue	Detection	Severity	Recommendation
Only detail-level fact table	Fact table has millions of rows, no pre-aggregations	Medium	Create aggregation tables (daily/weekly/monthly summaries) for common query patterns
No summary dimension attributes	Dimensions lack hierarchy levels (e.g., City → Region → Country)	Medium	Add rolled-up attributes for AI to answer at different granularities
Missing calendar hierarchies	Date dimension without Year → Quarter → Month → Week → Day	Medium	Add standard calendar hierarchies for time-based drill-down

12. Data Documentation for Context Grounding

Beyond TMDL metadata, provide additional context that AI agents can use for grounding and disambiguation.

Issue	Detection	Severity	Recommendation
No business glossary	No annotation or external reference defining business terms	Medium	Create a glossary annotation: key metrics definitions, business rules, calculation methodology
No data freshness indicator	No description of update frequency or latency	Low	Document in table description: "Updated daily at 06:00 UTC. Data latency: T-1."
No known limitations documented	No caveats about data coverage or quality	Low	Document in description: "Excludes internal test calls. Weekend data may have 2-hour delay."
No example questions	No annotation with sample natural-language queries	Low	Add annotation with 5-10 example questions the model can answer: "¿Cuántas llamadas atendió el agente X este mes?"

Example questions annotation:

table 'fact_calls'
	annotation ExampleQuestions = [
		"¿Cuántas llamadas hubo ayer?",
		"¿Cuál es el tiempo medio de gestión por agente?",
		"Top 5 agentes por volumen de llamadas este mes",
		"¿Cuál es la tasa de resolución en primera llamada?",
		"Comparar ingresos de este trimestre con el anterior",
		"¿Qué productos generan más llamadas de soporte?"
	]

AI Readiness Scoring

After evaluating all rules in this section, compute an AI Readiness Score (0-100%) for the model:

Component	Weight	Criteria
Table descriptions	15%	All tables have descriptions with grain and relationships
Measure descriptions	15%	All measures have descriptions with units and usage
Column descriptions	10%	All FKs, amounts, and coded columns described
Synonyms	15%	Tables, key columns, and measures have multilingual synonyms
Display folders	5%	Measures organized in logical folders
Format strings	5%	All numeric measures have appropriate format strings
Column visibility	10%	All technical columns hidden
Naming conventions	5%	Friendly display names, consistent language
Pre-computed measures	10%	Time comparisons, ratios, rankings available
Linguistic schema	5%	Q&A schema with phrasings and synonyms configured
Documentation	5%	Glossary, freshness, limitations, example questions

Score interpretation:

90-100%: Production-ready for AI/Copilot — model is fully grounded
70-89%: Usable with caveats — AI will sometimes misinterpret queries
50-69%: Significant gaps — AI will frequently generate wrong answers
< 50%: Not ready — AI usage will produce unreliable results

Workspace Audit Rules

Fabric Best Practices

Workspace Organization

Issue	Detection	Severity	Recommendation
No naming convention for items	Items have inconsistent prefixes/suffixes (mix of `rpt_`, `Report_`, no prefix)	Medium	Adopt a standard: `LH_` (lakehouse), `SM_` (semantic model), `NB_` (notebook), `PL_` (pipeline), `RPT_` (report)
Dev/test items in production workspace	Items with `test`, `dev`, `tmp`, `copia`, `backup` in name	High	Move to a separate dev workspace; use deployment pipelines for promotion
No workspace description	Workspace has no description	Low	Add a description explaining purpose, team ownership, and data domains
Too many items (> 50)	Workspace contains excessive items	Medium	Split by domain or team; large workspaces are hard to govern
No deployment pipeline configured	No pipeline linking dev → test → prod workspaces	Medium	Configure Fabric deployment pipelines for safe promotion of artifacts
Mixed ownership	Items owned by individual users instead of service principal/group	Medium	Transfer ownership to a service principal or shared admin group

Capacity & Performance Governance

Issue	Detection	Severity	Recommendation
Workspace not assigned to a capacity	No capacity binding	Critical	Assign to a Fabric capacity — items won't run without it
Using shared capacity for production	Workspace on shared (non-dedicated) capacity	High	Move to dedicated capacity (F64+) for predictable performance and SLAs
No refresh schedule on semantic models	Semantic models with no automatic refresh	Medium	Configure scheduled refresh aligned with data pipeline completion
Lakehouse without scheduled maintenance	No OPTIMIZE/VACUUM automation	Medium	Create a notebook + pipeline that runs OPTIMIZE weekly and VACUUM monthly
Long-running notebook executions	Notebooks timing out or running > 30 minutes	Medium	Review Spark configuration; consider partitioning, caching, or pre-aggregation

Security & Governance

Issue	Detection	Severity	Recommendation
No workspace roles defined	Only admin + viewer, no contributor/member granularity	Medium	Define roles: Admin (platform team), Member (developers), Contributor (ETL), Viewer (consumers)
Sensitive data without RLS	Tables with PII/financial data and no Row-Level Security in semantic model	High	Implement RLS in the semantic model; use `USERNAME()` or `USERPRINCIPALNAME()` filters
No sensitivity labels	No Microsoft Purview sensitivity labels applied	Low	Apply sensitivity labels per data classification policy
Direct lakehouse access without controls	Users querying lakehouse SQL endpoint directly without governance	Medium	Use semantic models as the governed access layer; restrict direct SQL endpoint access

Dimensional Modeling Compliance

Star Schema Validation

Issue	Detection	Severity	Recommendation
No clear fact/dimension separation	Tables not identifiable as fact or dimension by name or structure	High	Restructure into explicit star schema: `fact_` prefix for transactional tables, `dim_` for reference
Fact table contains descriptive attributes	Fact table has string columns with names, descriptions, categories stored directly	High	Extract to dimension tables; keep only FK references + measures in facts
Dimension table with measure columns	Dimension has numeric additive columns (amounts, totals)	High	Move measures to fact table or create a factless fact for those metrics
No conformed dimensions	Same entity (customer, product, date) duplicated across multiple fact tables with different structures	High	Create conformed dimensions shared across all facts for consistent slicing
Missing surrogate keys	Dimensions use natural/business keys as PK	Medium	Add integer surrogate keys for performance and change management (SCD)
No Slowly Changing Dimension handling	Dimension records overwritten with no history	Medium	Implement SCD Type 2 (start_date/end_date/is_current) for critical dimensions
Calendar table missing fiscal periods	Date dimension only has calendar dates, no fiscal year/quarter	Medium	Add fiscal periods, holidays, working days for business reporting
Junk dimensions not consolidated	Multiple low-cardinality flag columns in fact table	Low	Consolidate into a single junk dimension to reduce fact table width

Grain & Granularity

Issue	Detection	Severity	Recommendation
Mixed grain in fact table	Some rows at daily level, others at monthly (or order header + order line mixed)	Critical	Separate into distinct fact tables at consistent grain
Fact table without clear grain	Cannot determine what one row represents	High	Document the grain explicitly; if ambiguous, restructure
Pre-aggregated fact without detail	Only summary-level fact exists with no detail backup	Medium	Keep detail-level fact for drill-through; add aggregation tables separately

Relationship Patterns

Issue	Detection	Severity	Recommendation
Many-to-many without bridge table	Direct M:N relationship between two tables	High	Create a bridge/junction table to properly model the M:N relationship
Multiple active relationships to same table	Two active relationships from fact to same dimension	High	Make one inactive; use `USERELATIONSHIP()` in measures for the secondary path
Self-referencing dimension without parent-child	Hierarchy column (manager_id → employee_id) not modeled	Medium	Implement parent-child hierarchy with PATH/PATHITEM DAX functions
Circular dependency in relationships	A → B → C → A path exists	Critical	Redesign to eliminate cycles; consider role-playing or bridge patterns

Direct Lake Readiness

Direct Lake is Fabric's highest-performance mode for semantic models — it reads directly from Delta/Parquet files in OneLake without data movement. However, it has strict requirements.

Prerequisites & Compatibility

Issue	Detection	Severity	Recommendation
Semantic model uses Import mode	Model partitions show `mode: import` or explicit M queries with transformations	High	Convert to Direct Lake: remove transformations, ensure lakehouse tables are query-ready
Semantic model uses DirectQuery to SQL endpoint	Partitions use `Sql.Database()` with `mode: directQuery`	Medium	Consider Direct Lake for better performance if data fits within guardrails
Tables not in Delta format	Lakehouse contains CSV/Parquet files not loaded as Delta tables	Critical	Convert to Delta tables: `spark.read.parquet(...).write.format("delta").saveAsTable(...)`
Data transformations in partition expressions	M/Power Query in partition does joins, filters, or calculations	High	Move all transformations upstream to the lakehouse (notebooks/pipelines); Direct Lake requires clean passthrough
Lakehouse and semantic model in different workspaces	Model references tables in a different workspace's lakehouse	Medium	Place in same workspace or use shortcuts; cross-workspace Direct Lake has limitations

Direct Lake Guardrails

Direct Lake has capacity-dependent limits. If exceeded, the model falls back to DirectQuery (performance cliff).

Guardrail	F2	F64	F128+	What Happens
Max tables per model	500	500	500	Model creation fails
Max columns per table	500	500	500	Table skipped
Max rows per table	300M	1.5B	3B+	Fallback to DirectQuery
Max model size on disk	10 GB	40 GB	80 GB+	Fallback to DirectQuery
Max columns per model	2,000	2,000	2,000	Model creation fails

Issue	Detection	Severity	Recommendation
Table exceeds row limit for capacity	Row count > capacity guardrail	Critical	Partition the table, reduce historical depth, or upgrade capacity SKU
Model size exceeds capacity guardrail	Total Delta file size > model size limit	High	Reduce column count, compress strings, or split into multiple models
Too many columns in a table	Table has > 300 columns	High	Vertical partitioning: split into related tables
Unneeded columns inflating model size	Wide tables with columns never used in reports	Medium	Remove unused columns from lakehouse tables or create views

Data Preparation for Direct Lake

Issue	Detection	Severity	Recommendation
String columns with high cardinality	String column with > 1M distinct values	High	Consider hashing, bucketing, or moving to a detail table not in the model
Columns with mixed types	Column storing numbers as strings intermittently	High	Enforce strict typing in the lakehouse; Direct Lake cannot handle type mismatches gracefully
Decimal columns with excessive precision	`DECIMAL(38,18)` when `DECIMAL(18,2)` suffices	Medium	Reduce precision — smaller Parquet footprint = faster Direct Lake
Large text columns (> 1KB avg)	Description/notes columns with long text in Direct Lake model	Medium	Exclude from model or move to a detail table only accessed via drill-through
Files not V-Ordered	V-Order disabled on Direct Lake source tables	High	Enable V-Order — Direct Lake reads V-Ordered Parquet significantly faster
Too many small files	Avg file < 32 MB in Direct Lake source table	High	Run `OPTIMIZE` — Direct Lake opens each file individually; many small files = slow
Data not compacted after incremental load	New data appended as tiny files after each pipeline run	High	Add `OPTIMIZE` step at end of each pipeline/notebook that modifies the table

Direct Lake Semantic Model Configuration

Issue	Detection	Severity	Recommendation
Fallback to DirectQuery not monitored	No alert on fallback events	Medium	Monitor via Fabric capacity metrics; set alerts for DirectQuery fallback
Auto-refresh not configured	Direct Lake model not set to detect Delta changes	Medium	Enable `autoSync` so model picks up new data without manual refresh
Framing not triggered after data load	Pipeline loads data but doesn't frame the model	High	Add model framing call (`POST /refresh`) at end of pipeline to update Direct Lake metadata
Calculated columns in Direct Lake	Model uses calculated columns (DAX-computed at refresh)	Medium	Move calculation logic to the lakehouse (PySpark/SQL) — calculated columns force import-mode fallback for those columns
Calculated tables in model	`CALENDAR()`, `GENERATESERIES()`, or other calculated tables	Medium	Materialize as lakehouse tables; calculated tables aren't Direct Lake compatible
RLS with complex DAX filters	Row-Level Security using `LOOKUPVALUE` or other complex patterns	Medium	Simplify RLS to direct column comparisons; complex RLS can trigger fallback

Direct Lake Migration Checklist

When recommending migration from Import/DirectQuery to Direct Lake, provide this ordered checklist:

Verify capacity SKU — Direct Lake requires F64+ for production workloads
Audit table sizes — ensure all tables fit within guardrails for the capacity tier
Move transformations upstream — all data prep must happen in the lakehouse, not in Power Query
Optimize Delta tables — run OPTIMIZE + VACUUM on all source tables; enable V-Order
Remove calculated tables — materialize as lakehouse tables
Remove calculated columns — push logic to lakehouse or convert to measures
Simplify RLS — ensure filters are simple column comparisons
Create the Direct Lake model — use TMDL with mode: directLake partitions pointing to lakehouse tables
Configure auto-sync — enable automatic metadata sync with Delta log
Add framing to pipelines — trigger model refresh after each data load
Monitor fallback — set up capacity metric alerts for DirectQuery fallback events
Validate performance — compare query times vs previous mode; verify no fallback

Workspace Audit Scoring

After evaluating all workspace audit rules, compute a Workspace Health Score (0-100%):

Component	Weight	Criteria
Workspace organization	10%	Consistent naming, no dev artifacts in prod, description set
Capacity & performance	15%	Dedicated capacity, refresh schedules, maintenance automation
Security & governance	15%	Roles defined, RLS where needed, sensitivity labels
Star schema compliance	20%	Clear fact/dim separation, conformed dimensions, correct grain
Dimensional modeling quality	15%	Surrogate keys, SCD handling, hierarchies, no mixed grain
Direct Lake readiness	15%	Tables in Delta, V-Ordered, within guardrails, no transformations in model
Direct Lake configuration	10%	Auto-sync, framing, no calculated tables/columns, monitored fallback

Score interpretation:

90-100%: Exemplary workspace — production-ready, well-governed, Direct Lake optimized
70-89%: Good foundation — minor improvements needed for full optimization
50-69%: Significant issues — performance and governance risks present
< 50%: Major redesign needed — critical anti-patterns detected

Recommendation Report Format

Each recommendation must include:

┌─────────────────────────────────────────────────────┐
│ [SEVERITY] Category — Short title                   │
├─────────────────────────────────────────────────────┤
│ Target:    {table/model/column}                     │
│ Finding:   What was detected                        │
│ Impact:    Why it matters (performance, correctness) │
│ Action:    Specific steps to fix                    │
│ Effort:    Low / Medium / High                      │
│ Priority:  1-5 (1 = fix immediately)                │
│ Code:      (if applicable) Script/command to run    │
└─────────────────────────────────────────────────────┘

Severity levels:

Critical: Data correctness at risk or severe performance degradation
High: Significant performance impact or anti-pattern
Medium: Moderate improvement opportunity
Low: Best-practice suggestion, cosmetic or minor

Action Plan Generation

When generating an action plan, group recommendations by:

Quick Wins (Low effort, High/Critical severity) — do these first
Important Fixes (Medium effort, High/Critical severity) — schedule soon
Improvements (Any effort, Medium severity) — plan for next sprint
Nice-to-Haves (Low severity) — backlog

For each group, provide:

Ordered list of actions with dependencies (e.g., "Run OPTIMIZE before VACUUM")
Estimated impact summary
PySpark/SQL/DAX code snippets ready to execute
Rollback steps where applicable

Deployment of Recommendations

When the user chooses to deploy (apply) recommendations:

Data Storage optimizations

Generate a PySpark notebook with cells for each optimization:

OPTIMIZE commands with ZORDER
VACUUM commands
ALTER TABLE for property changes
ANALYZE TABLE for statistics
Data type casts via Delta table rewrites

Data Model changes

Generate a PySpark notebook with:

Table creation DDL for new dimension/fact tables
INSERT INTO SELECT for data migration
ALTER TABLE for column renames
DROP TABLE for consolidated tables (after user confirmation)

Semantic Model fixes

Generate updated TMDL files with fixes applied:

Updated measures with correct DAX
Fixed relationship definitions
Corrected column configurations
Deploy using fabric_semantic_model.py deploy

Script Reference

Use ./scripts/fabric_optimization_review.py for Fabric REST API operations during the review.

Install dependencies (one-time):

pip install azure-identity requests

Get table properties and Delta log metadata

python scripts/fabric_optimization_review.py table-info <workspace_id> <lakehouse_id> <table_name>

Get semantic model definition (TMDL)

python scripts/fabric_optimization_review.py get-model <workspace_id> <model_id>

List all items in a workspace

python scripts/fabric_optimization_review.py list-items <workspace_id> [--type lakehouse|semanticmodel]

Gotchas

V-Order is enabled by default in Fabric but can be disabled at the table level. Always check delta.parquet.vorder.enabled in table properties rather than assuming.
OPTIMIZE is safe but resource-intensive. Running on very large tables during business hours can impact concurrent queries. Recommend off-peak scheduling.
VACUUM is destructive. Once files are vacuumed, time travel to earlier versions is lost. Always warn the user and confirm retention period.
Z-Order is not partitioning. Z-Order co-locates related data within files but does not create folder structure. Both can be used together.
Direct Lake models cannot be deployed via the standard REST API POST semanticModels with TMDL. Use the Fabric Items API with definition format or create via the Power BI Desktop / Fabric portal. For the workspace audit, recommend Direct Lake readiness but clarify the deployment path.
Direct Lake fallback is silent. When a model exceeds guardrails, it silently falls back to DirectQuery mode. The only way to detect fallback is via capacity metrics or DAX INFO.STORAGEMODEUSED(). Always recommend monitoring.
Column-level statistics are not computed automatically in Fabric Delta tables. Stale or missing statistics cause the optimizer to make bad decisions (full scans instead of file skipping).
Bidirectional cross-filtering is the most common semantic model performance killer. Always flag it unless there's a documented bridge-table pattern.
Format strings are locale-dependent. #,##0.00 is US format. For es-ES, use #.##0,00 (dot for thousands, comma for decimals).

fabric-optimization-review

Plus depuis ce dépôt

Plus depuis ce dépôt

Fabric Optimization Review — Reference

Review Categories

Data Storage Optimization Rules

Small File Detection

Partitioning Assessment

V-Order and Z-Order

Table Maintenance

Schema & Data Type Optimization

Data Model Review Rules

Star Schema Compliance

Key & Relationship Integrity

Naming Convention Review

Semantic Model Review Rules

DAX Measures

Relationship Configuration

Column Configuration

Locale & Formatting

AI & Agentic Readiness Rules

1. Table Descriptions

2. Measure Descriptions

3. Column Descriptions

4. Synonyms & Annotations

5. Display Folders

6. Format Strings

7. Technical Column Visibility

8. Display Names & Naming Conventions

9. Pre-Computed Measures for AI

10. Linguistic Schema (Q&A / NLQ)

11. Data Granularity & Aggregation Tables

12. Data Documentation for Context Grounding

AI Readiness Scoring

Workspace Audit Rules

Fabric Best Practices

Workspace Organization

Capacity & Performance Governance

Security & Governance

Dimensional Modeling Compliance

Star Schema Validation

Grain & Granularity

Relationship Patterns

Direct Lake Readiness

Prerequisites & Compatibility

Direct Lake Guardrails

Data Preparation for Direct Lake

Direct Lake Semantic Model Configuration

Direct Lake Migration Checklist

Workspace Audit Scoring

Recommendation Report Format

Action Plan Generation

Deployment of Recommendations

Data Storage optimizations

Data Model changes

Semantic Model fixes

Script Reference

Get table properties and Delta log metadata

Get semantic model definition (TMDL)

List all items in a workspace

Gotchas

Fabric Optimization Review — Reference

Review Categories

Data Storage Optimization Rules

Small File Detection

Partitioning Assessment

V-Order and Z-Order

Table Maintenance

Schema & Data Type Optimization

Data Model Review Rules

Star Schema Compliance

Key & Relationship Integrity

Naming Convention Review

Semantic Model Review Rules

DAX Measures

Relationship Configuration

Column Configuration

Locale & Formatting

AI & Agentic Readiness Rules

1. Table Descriptions