ワンクリックで
dremio-iceberg-operations
// Teaches an AI agent how to create, manage, and maintain Apache Iceberg tables in Dremio — including DML, schema evolution, time travel, table maintenance, partitioning, and versioned catalog workflows.
// Teaches an AI agent how to create, manage, and maintain Apache Iceberg tables in Dremio — including DML, schema evolution, time travel, table maintenance, partitioning, and versioned catalog workflows.
[HINT] SKILL.mdと関連ファイルを含む完全なスキルディレクトリをダウンロード
| name | Dremio Iceberg Operations |
| description | Teaches an AI agent how to create, manage, and maintain Apache Iceberg tables in Dremio — including DML, schema evolution, time travel, table maintenance, partitioning, and versioned catalog workflows. |
This skill is a deep guide for working with Apache Iceberg tables in Dremio. Iceberg is the core table format in Dremio's lakehouse architecture — enabling ACID transactions, schema evolution, time travel, and partition evolution. This skill covers the full lifecycle: create → populate → query → evolve → maintain.
Iceberg tables are stored in a lakehouse catalog source (Nessie, Arctic, AWS Glue, Iceberg REST, or Open Catalog).
-- Basic table
CREATE TABLE catalog.schema.customers (
id INT,
name VARCHAR,
email VARCHAR,
created_at TIMESTAMP
);
-- With partitioning
CREATE TABLE catalog.schema.events (
event_id BIGINT,
event_type VARCHAR,
user_id INT,
event_date DATE,
payload VARCHAR
) PARTITION BY (event_type, MONTH(event_date));
-- With sorting (for better data layout)
CREATE TABLE catalog.schema.orders (
order_id INT,
customer_id INT,
amount DOUBLE,
order_date DATE
) PARTITION BY (YEAR(order_date))
LOCALSORT BY (customer_id);
-- CTAS — create from a query
CREATE TABLE catalog.schema.summary AS
SELECT region, COUNT(*) AS cnt, SUM(amount) AS total
FROM catalog.schema.orders
GROUP BY region;
Partition transforms available: YEAR(), MONTH(), DAY(), HOUR(), BUCKET(n, col), TRUNCATE(n, col), or identity (just the column name).
For exact syntax: https://docs.dremio.com/current/reference/sql/commands/create-table
-- Insert values
INSERT INTO catalog.schema.customers
VALUES (1, 'Alice', 'alice@example.com', CURRENT_TIMESTAMP);
-- Insert from query
INSERT INTO catalog.schema.summary
SELECT region, COUNT(*), SUM(amount) FROM catalog.schema.orders GROUP BY region;
Docs: https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/apache-iceberg-insert/
UPDATE catalog.schema.customers
SET email = 'alice.new@example.com'
WHERE id = 1;
Docs: https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/apache-iceberg-update/
DELETE FROM catalog.schema.customers WHERE id = 1;
Docs: https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/apache-iceberg-delete/
The most powerful DML command — handles INSERT, UPDATE, and DELETE in a single atomic operation.
MERGE INTO catalog.schema.customers AS target
USING catalog.schema.staging_customers AS source
ON target.id = source.id
WHEN MATCHED AND source.is_deleted = true THEN
DELETE
WHEN MATCHED THEN
UPDATE SET target.name = source.name, target.email = source.email
WHEN NOT MATCHED THEN
INSERT (id, name, email, created_at)
VALUES (source.id, source.name, source.email, CURRENT_TIMESTAMP);
Docs: https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/apache-iceberg-merge/
Load data from object storage (S3, Azure, GCS) directly into Iceberg tables.
-- Parquet files
COPY INTO catalog.schema.events
FROM '@my_s3_source/raw/events/'
FILE_FORMAT 'parquet';
-- CSV with options
COPY INTO catalog.schema.events
FROM '@my_s3_source/raw/events.csv'
FILE_FORMAT 'csv'
(RECORD_DELIMITER '\n', FIELD_DELIMITER ',', SKIP_FIRST_LINE);
-- JSON
COPY INTO catalog.schema.events
FROM '@my_s3_source/raw/events/'
FILE_FORMAT 'json';
The @source_name prefix refers to an object storage source already configured in Dremio.
Docs: https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/copy-into-table/
Modify table schemas without rewriting data files.
-- Add a column
ALTER TABLE catalog.schema.customers ADD COLUMNS (phone VARCHAR);
-- Drop a column
ALTER TABLE catalog.schema.customers DROP COLUMN phone;
-- Rename a column
ALTER TABLE catalog.schema.customers CHANGE COLUMN name full_name VARCHAR;
Docs: https://docs.dremio.com/current/reference/sql/commands/alter-table/
Change the partitioning strategy without rewriting existing data. New data uses the new scheme; old data keeps the old scheme. Iceberg handles this transparently.
-- Add a new partition field
ALTER TABLE catalog.schema.events ADD PARTITION FIELD DAY(event_date);
-- Drop a partition field
ALTER TABLE catalog.schema.events DROP PARTITION FIELD MONTH(event_date);
Query data as it existed at a previous point in time.
-- At a specific snapshot ID
SELECT * FROM catalog.schema.customers AT SNAPSHOT '1234567890123456789';
-- At a specific timestamp
SELECT * FROM catalog.schema.customers AT TIMESTAMP '2024-06-15 12:00:00';
-- Show table properties to see available snapshots
SHOW TBLPROPERTIES catalog.schema.customers;
When using a Nessie or Arctic catalog, Iceberg tables support Git-like branching and tagging.
-- Query at a specific branch
SELECT * FROM catalog.schema.customers AT BRANCH main;
SELECT * FROM catalog.schema.customers AT BRANCH dev;
-- Query at a tag
SELECT * FROM catalog.schema.customers AT TAG v1_release;
-- DML operations on a branch
-- (set context to the branch first)
INSERT INTO catalog.schema.customers AT BRANCH dev
VALUES (99, 'Test User', 'test@example.com', CURRENT_TIMESTAMP);
Small files degrade query performance. OPTIMIZE rewrites them into larger, more efficient files.
-- Compact all files
OPTIMIZE TABLE catalog.schema.events;
-- Compact with specific target file size (in bytes)
OPTIMIZE TABLE catalog.schema.events
FOR PARTITIONS event_type = 'click';
When to run: After many small INSERT operations, or on a regular schedule (daily/weekly).
Docs: https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/optimize-table/
Clean up old snapshots and orphaned data files to reclaim storage.
-- Remove snapshots older than a specific timestamp
VACUUM TABLE catalog.schema.events
EXPIRE SNAPSHOTS OLDER_THAN '2024-01-01 00:00:00';
-- Remove orphan files (files not referenced by any snapshot)
VACUUM TABLE catalog.schema.events
REMOVE ORPHAN FILES OLDER_THAN '2024-01-01 00:00:00';
Warning: After VACUUM, time travel to removed snapshots is no longer possible.
Docs: https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/vacuum-table/
Revert a table to a prior snapshot.
ROLLBACK TABLE catalog.schema.events TO SNAPSHOT 'snapshot_id_here';
Docs: https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/rollback-table/
Remove all rows from a table without dropping it.
TRUNCATE TABLE catalog.schema.events;
Accelerate queries with materialized reflections.
-- Raw reflection (full copy with selected columns)
ALTER TABLE catalog.schema.events
CREATE RAW REFLECTION events_raw
USING DISPLAY (event_id, event_type, user_id, event_date);
-- Aggregation reflection
ALTER TABLE catalog.schema.events
CREATE AGGREGATE REFLECTION events_agg
USING DIMENSIONS (event_type, event_date)
MEASURES (event_id (COUNT));
Docs: https://docs.dremio.com/current/reference/sql/commands/acceleration
MONTH(date) over DAY(date) for small tables.