Run any Skill in Manus with one click

$pwd:

fabric-lakehouse

Name: Fabric Lakehouse
Author: kimtth

// Fabric Lakehouse design, schemas, shortcuts, security, optimization, and PySpark patterns. Use when designing Lakehouse solutions, managing Delta tables, configuring OneLake shortcuts, or writing PySpark/Spark SQL code for Fabric notebooks.

Run Skill in Manus

$ git log --oneline --stat

stars:2

forks:0

updated:May 15, 2026 at 07:56

SKILL.md

readonly

name	fabric-lakehouse
description	Fabric Lakehouse design, schemas, shortcuts, security, optimization, and PySpark patterns. Use when designing Lakehouse solutions, managing Delta tables, configuring OneLake shortcuts, or writing PySpark/Spark SQL code for Fabric notebooks.

Fabric Lakehouse

Core Concepts

Lakehouse in Microsoft Fabric combines the flexibility of a data lake with the management of a data warehouse:

Unified storage in OneLake for structured and unstructured data
Delta Lake format for ACID transactions, versioning, and time travel
SQL analytics endpoint for T-SQL queries (auto-generated, read-only)
Default semantic model for Power BI integration
Support for CSV, Parquet (Spark-only querying for non-Delta formats)

Key Components

Component	Purpose
Delta Tables	Managed tables with ACID compliance and schema enforcement under `Tables/`
Files	Unstructured/semi-structured data under `Files/`
SQL Endpoint	Auto-generated read-only SQL interface
Shortcuts	Virtual links to external/internal data without copying
Materialized Views	Pre-computed tables for fast query performance

Schemas

When creating a Lakehouse, users can enable schemas to organize tables:

Schemas are folders under Tables/
Default schema is dbo (cannot be deleted or renamed)
Can reference schemas in other Lakehouses via Schema Shortcuts

Security

Workspace Roles (Control Plane)

Role	Access
Admin	Full control
Member	Create, edit, delete items
Contributor	Edit existing items
Viewer	Read-only

OneLake Security (Data Plane)

Based on Microsoft Entra ID and RBAC
Supports column-level and row-level security on tables
Data access controlled through OneLake permissions

Shortcuts

Virtual links to data without copying:

Type	Target
Internal	Other Fabric Lakehouses/tables (cross-workspace)
ADLS Gen2	Azure Data Lake Storage Gen2 containers
S3	Amazon S3 buckets
GCS	Google Cloud Storage
Dataverse	Dataverse tables

Create Shortcut (REST)

POST /v1/workspaces/<wsId>/items/<lakehouseId>/shortcuts
{
  "path": "Tables/external_data",
  "name": "my_shortcut",
  "target": {
    "oneLake": {
      "workspaceId": "<sourceWsId>",
      "itemId": "<sourceItemId>",
      "path": "Tables/source_table"
    }
  }
}

Table Optimization

V-Order

Write optimization that applies sorting, encoding, and compression for fast reads. Enabled by default for Fabric Spark.

df.write.format("delta").option("vorder", "true").save(path)

OPTIMIZE and VACUUM

-- Compact small files
OPTIMIZE lakehouse.schema.table_name

-- Remove old files (default 7-day retention)
VACUUM lakehouse.schema.table_name

Z-Order

Co-locate related data for faster filter queries:

OPTIMIZE lakehouse.schema.table_name ZORDER BY (column1, column2)

PySpark Patterns

Read Delta Table

df = spark.read.format("delta").load("Tables/my_table")
# or
df = spark.sql("SELECT * FROM lakehouse.dbo.my_table")

Write Delta Table

df.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .save("Tables/my_table")

Incremental Load (Merge)

from delta.tables import DeltaTable

target = DeltaTable.forPath(spark, "Tables/my_table")
target.alias("t").merge(
    source_df.alias("s"),
    "t.id = s.id"
).whenMatchedUpdateAll() \
 .whenNotMatchedInsertAll() \
 .execute()

Notebook Utilities

Microsoft Fabric renamed MSSparkUtils to NotebookUtils. Existing mssparkutils code remains backward-compatible, but new code should prefer notebookutils for continued support and access to newer modules.

# List files
notebookutils.fs.ls("Files/raw/")

# Copy files
notebookutils.fs.cp("Files/source/", "Files/dest/", recurse=True)

# Get secret from Key Vault
secret = notebookutils.credentials.getSecret("https://my-kv.vault.azure.net/", "secret-name")

# Run another notebook
notebookutils.notebook.run("other_notebook", timeout_seconds=600, arguments={"param1": "value1"})

Medallion Architecture

Layer	Purpose	Pattern
Bronze	Raw ingestion	Append-only, preserve source schema
Silver	Cleaned/conformed	Deduplication, type casting, null handling
Gold	Business-ready	Aggregation, star schema, KPIs

Each layer is a separate Lakehouse or schema within a Lakehouse.

Must

Use Delta format for all managed tables
Enable V-Order for read-heavy workloads
Partition large tables by date or high-cardinality columns
Run OPTIMIZE regularly on frequently queried tables
Use shortcuts instead of copying data across workspaces

Avoid

Storing non-Delta tabular data if SQL endpoint access is needed
Deep folder nesting in Files/ (keep hierarchy shallow)
Small file accumulation without OPTIMIZE
Hardcoding OneLake paths — use workspace/item variables

related-skills.json

same repository

fabric-alm-cicd.md

from "kimtth/ms-fabric-skills-dev-starter"

Plan, implement, review, and troubleshoot Microsoft Fabric ALM and CI/CD workflows using Git integration, deployment pipelines, Variable Libraries, Fabric REST APIs, fabric-cicd, GitHub Actions, or Azure DevOps. Use when the user asks about source control, deploy, promote, release, dev/test/prod, environment variables, deployment pipeline automation, Git sync, fabric-cicd, or Fabric item definition validation.

2026-05-182

fabric-data-agent.md

from "kimtth/ms-fabric-skills-dev-starter"

Design, configure, evaluate, and govern Microsoft Fabric Data Agents for natural-language Q&A over Lakehouse, Warehouse, Power BI semantic model, KQL database, mirrored database, ontology, or Microsoft Graph data. Use when the user asks for Fabric data agent, conversational analytics, NL2SQL, NL2DAX, NL2KQL, data-agent instructions, example queries, agent evaluation, publishing, sharing, governance, diagnostics, or ALM.

2026-05-182

spark-authoring-cli.md

from "kimtth/ms-fabric-skills-dev-starter"

Develop Microsoft Fabric Spark/data engineering workflows and write code in Fabric Notebook cells with intelligent routing to specialized resources. Provides workspace/lakehouse management, notebook code authoring (PySpark, Scala, SparkR, SQL), and routes to: data engineering patterns, development workflow, or infrastructure orchestration. Use when the user wants to: (1) manage Fabric workspaces and resources, (2) write or debug code in notebook cells, (3) use notebookutils, (4) develop notebooks and PySpark applications, (5) design data pipelines, (6) provision infrastructure as code. Triggers: "develop notebook", "data engineering", "workspace setup", "pipeline design", "infrastructure provisioning", "Delta Lake patterns", "Spark development", "lakehouse configuration", "write notebook code", "notebookutils", "notebook cell", "PySpark notebook", "%%sql cell", "%%configure", "fabric notebook", "run notebook", "notebook deployment".

2026-05-152

spark-operations-cli.md

from "kimtth/ms-fabric-skills-dev-starter"

Diagnose failed Spark jobs, unhealthy Livy sessions, and performance bottlenecks in Microsoft Fabric via read-only CLI triage. Use when the user wants to: (1) diagnose why a Spark job, notebook run, or Lakehouse job failed, (2) triage stuck or dead Livy sessions, (3) identify OOM, shuffle spill, or data skew, (4) retrieve driver and executor logs or Spark Advisor findings, (5) copy event logs and start a local Spark History Server, (6) diagnose all Spark activities within a failed pipeline run. Triggers: "diagnose my failed notebook", "why did my spark job fail", "triage spark failure", "diagnose pipeline run failure", "why did my pipeline fail", "livy session stuck in starting", "spark executor OOM", "check spark advisor findings", "shuffle spill diagnosis", "why did my lakehouse job fail", "diagnose lakehouse table load", "data skew diagnosis", "open spark history server locally", "analyze spark failure logs", "spark job triage".

2026-05-152

fabric-api-discovery.md

from "kimtth/ms-fabric-skills-dev-starter"

Discover Fabric APIs, OpenAPI specs, item schemas, and best practices using the Fabric MCP Server. Use when exploring available Fabric workloads, looking up API specifications, finding item definition formats, or managing OneLake files programmatically. All MCP tools run locally for reference.

2026-04-102

fabric-core.md

from "kimtth/ms-fabric-skills-dev-starter"

Core Microsoft Fabric platform reference: topology, authentication, token scopes, REST API base URL, pagination, long-running operations, throttling, workspace and item resolution, OneLake access, and common gotchas. Use this skill whenever working with Fabric REST APIs, managing workspaces/items, or troubleshooting auth errors.

2026-04-102

package.json

"author": "kimtth"

"repository": "kimtth/ms-fabric-skills-dev-starter"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	fabric-lakehouse
description	Fabric Lakehouse design, schemas, shortcuts, security, optimization, and PySpark patterns. Use when designing Lakehouse solutions, managing Delta tables, configuring OneLake shortcuts, or writing PySpark/Spark SQL code for Fabric notebooks.

Fabric Lakehouse

Core Concepts

Lakehouse in Microsoft Fabric combines the flexibility of a data lake with the management of a data warehouse:

Unified storage in OneLake for structured and unstructured data
Delta Lake format for ACID transactions, versioning, and time travel
SQL analytics endpoint for T-SQL queries (auto-generated, read-only)
Default semantic model for Power BI integration
Support for CSV, Parquet (Spark-only querying for non-Delta formats)

Key Components

Component	Purpose
Delta Tables	Managed tables with ACID compliance and schema enforcement under `Tables/`
Files	Unstructured/semi-structured data under `Files/`
SQL Endpoint	Auto-generated read-only SQL interface
Shortcuts	Virtual links to external/internal data without copying
Materialized Views	Pre-computed tables for fast query performance

Schemas

When creating a Lakehouse, users can enable schemas to organize tables:

Schemas are folders under Tables/
Default schema is dbo (cannot be deleted or renamed)
Can reference schemas in other Lakehouses via Schema Shortcuts

Security

Workspace Roles (Control Plane)

Role	Access
Admin	Full control
Member	Create, edit, delete items
Contributor	Edit existing items
Viewer	Read-only

OneLake Security (Data Plane)

Based on Microsoft Entra ID and RBAC
Supports column-level and row-level security on tables
Data access controlled through OneLake permissions

Shortcuts

Virtual links to data without copying:

Type	Target
Internal	Other Fabric Lakehouses/tables (cross-workspace)
ADLS Gen2	Azure Data Lake Storage Gen2 containers
S3	Amazon S3 buckets
GCS	Google Cloud Storage
Dataverse	Dataverse tables

Create Shortcut (REST)

POST /v1/workspaces/<wsId>/items/<lakehouseId>/shortcuts
{
  "path": "Tables/external_data",
  "name": "my_shortcut",
  "target": {
    "oneLake": {
      "workspaceId": "<sourceWsId>",
      "itemId": "<sourceItemId>",
      "path": "Tables/source_table"
    }
  }
}

Table Optimization

V-Order

Write optimization that applies sorting, encoding, and compression for fast reads. Enabled by default for Fabric Spark.

df.write.format("delta").option("vorder", "true").save(path)

OPTIMIZE and VACUUM

-- Compact small files
OPTIMIZE lakehouse.schema.table_name

-- Remove old files (default 7-day retention)
VACUUM lakehouse.schema.table_name

Z-Order

Co-locate related data for faster filter queries:

OPTIMIZE lakehouse.schema.table_name ZORDER BY (column1, column2)

PySpark Patterns

Read Delta Table

df = spark.read.format("delta").load("Tables/my_table")
# or
df = spark.sql("SELECT * FROM lakehouse.dbo.my_table")

Write Delta Table

df.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .save("Tables/my_table")

Incremental Load (Merge)

from delta.tables import DeltaTable

target = DeltaTable.forPath(spark, "Tables/my_table")
target.alias("t").merge(
    source_df.alias("s"),
    "t.id = s.id"
).whenMatchedUpdateAll() \
 .whenNotMatchedInsertAll() \
 .execute()

Notebook Utilities

# List files
notebookutils.fs.ls("Files/raw/")

# Copy files
notebookutils.fs.cp("Files/source/", "Files/dest/", recurse=True)

# Get secret from Key Vault
secret = notebookutils.credentials.getSecret("https://my-kv.vault.azure.net/", "secret-name")

# Run another notebook
notebookutils.notebook.run("other_notebook", timeout_seconds=600, arguments={"param1": "value1"})

Medallion Architecture

Layer	Purpose	Pattern
Bronze	Raw ingestion	Append-only, preserve source schema
Silver	Cleaned/conformed	Deduplication, type casting, null handling
Gold	Business-ready	Aggregation, star schema, KPIs

Each layer is a separate Lakehouse or schema within a Lakehouse.

Must

Use Delta format for all managed tables
Enable V-Order for read-heavy workloads
Partition large tables by date or high-cardinality columns
Run OPTIMIZE regularly on frequently queried tables
Use shortcuts instead of copying data across workspaces

Avoid

Storing non-Delta tabular data if SQL endpoint access is needed
Deep folder nesting in Files/ (keep hierarchy shallow)
Small file accumulation without OPTIMIZE
Hardcoding OneLake paths — use workspace/item variables

fabric-lakehouse

Fabric Lakehouse

Core Concepts

Key Components

Schemas

Security

Workspace Roles (Control Plane)

OneLake Security (Data Plane)

Shortcuts

Create Shortcut (REST)

Table Optimization

V-Order

OPTIMIZE and VACUUM

Z-Order

PySpark Patterns

Read Delta Table

Write Delta Table

Incremental Load (Merge)

Notebook Utilities

Medallion Architecture

Must

Avoid

More from this repository

More from this repository

Fabric Lakehouse

Core Concepts

Key Components

Schemas

Security

Workspace Roles (Control Plane)

OneLake Security (Data Plane)

Shortcuts

Create Shortcut (REST)

Table Optimization

V-Order

OPTIMIZE and VACUUM

Z-Order

PySpark Patterns

Read Delta Table

Write Delta Table

Incremental Load (Merge)

Notebook Utilities

Medallion Architecture

Must

Avoid