mit einem Klick
dremio-python-libraries
Enables an AI agent to use dremio-simple-query (lightweight SQL/Arrow Flight) and DremioFrame (full dataframe builder with ingestion, modeling, and admin) to interact with Dremio from Python.
Menü
Enables an AI agent to use dremio-simple-query (lightweight SQL/Arrow Flight) and DremioFrame (full dataframe builder with ingestion, modeling, and admin) to interact with Dremio from Python.
Enables an AI agent to authenticate with and make curl requests to the Dremio REST API for both Dremio Software and Dremio Cloud.
Enables an AI agent to install, configure, and use the dremio-cli Python tool to manage Dremio Software and Cloud from the command line.
Guides an AI agent through adding data sources to Dremio by asking the user the right questions, recommending connection settings, and linking to the exact documentation for each connector.
Teaches an AI agent how to create, manage, and maintain Apache Iceberg tables in Dremio — including DML, schema evolution, time travel, table maintenance, partitioning, and versioned catalog workflows.
Teaches an AI agent data modeling best practices in a Dremio lakehouse — medallion architecture, views vs tables, reflections strategy, partitioning, dimensional modeling, and semantic layer design.
Teaches an AI agent the Dremio SQL dialect — unique syntax, Iceberg DML, reflections DDL, versioned queries, RBAC commands, and links to the full SQL reference docs.
| name | Dremio Python Libraries |
| description | Enables an AI agent to use dremio-simple-query (lightweight SQL/Arrow Flight) and DremioFrame (full dataframe builder with ingestion, modeling, and admin) to interact with Dremio from Python. |
This skill covers two Python libraries for working with Dremio. Choose the right tool for the job:
| Need | Use |
|---|---|
| Run SQL and get results as Arrow, Pandas, Polars, or DuckDB | dremio-simple-query |
| Data ingestion, CRUD, Iceberg management, admin, modeling, charting, orchestration | DremioFrame |
A lightweight library that uses Apache Arrow Flight for high-performance SQL queries against Dremio.
pip install dremio-simple-query
Both dremio-simple-query and DremioFrame share the same profile file at ~/.dremio/profiles.yaml. Create it with the following structure:
profiles:
# Dremio Cloud with PAT
my_cloud:
type: cloud
base_url: https://api.dremio.cloud
auth:
type: pat
token: MY_PAT_TOKEN
# Software with PAT
my_software_pat:
type: software
base_url: https://dremio.company.com
auth:
type: pat
token: MY_PAT_TOKEN
# Software with Username/Password
my_software_basic:
type: software
base_url: https://dremio.company.com
auth:
type: username_password
username: my_user
password: my_password
# Software with OAuth Client Credentials
my_software_oauth:
type: software
base_url: https://dremio.company.com
auth:
type: oauth
client_id: MY_CLIENT_ID
client_secret: MY_CLIENT_SECRET
Then connect using a profile name:
from dremio_simple_query.connectv2 import DremioConnection
dremio = DremioConnection(profile="my_cloud")
from dremio_simple_query.connectv2 import DremioConnection
from os import getenv
from dotenv import load_dotenv
load_dotenv()
# Dremio Cloud — PAT auth
dremio = DremioConnection(
location=getenv("ARROW_ENDPOINT"), # e.g. grpc+tls://data.dremio.cloud:443
token=getenv("DREMIO_TOKEN"),
project_id=getenv("DREMIO_PROJECT_ID") # Optional for Cloud
)
# Dremio Software — Username/Password auth
dremio = DremioConnection(
location="grpc+tls://dremio.company.com:32010",
username="my_user",
password="my_password"
)
Agent prompt: "I need to connect to Dremio from Python. Are you using Dremio Cloud or Dremio Software? Do you have a PAT (Personal Access Token), or should we use username/password? Do you already have a
~/.dremio/profiles.yamlconfigured?"
Always use the V2 client (dremio_simple_query.connectv2).
from dremio_simple_query.connectv2 import DremioConnection
dremio = DremioConnection(profile="my_profile")
# Arrow FlightStreamReader (raw, most performant)
stream = dremio.toArrow("SELECT * FROM my_table")
arrow_table = stream.read_all() # Arrow Table
batch_reader = stream.to_reader() # RecordBatchReader
# Pandas DataFrame
df = dremio.toPandas("SELECT * FROM my_table")
# Polars DataFrame
df = dremio.toPolars("SELECT * FROM my_table")
# DuckDB Relation
duck_rel = dremio.toDuckDB("SELECT * FROM my_table")
result = duck_rel.query("my_table", "SELECT * FROM my_table").fetchall()
import duckdb
stream = dremio.toArrow("SELECT * FROM my_table")
my_table = stream.read_all()
con = duckdb.connection()
results = con.execute("SELECT * FROM my_table").fetchall()
A comprehensive Python library providing a dataframe builder interface for Dremio with CRUD, ingestion, admin, modeling, charting, orchestration, and AI features. Currently in alpha.
pip install dremioframe
# With optional dependencies (e.g., for chart image export)
pip install "dremioframe[image_export]"
Optional dependency groups are documented at: https://github.com/developer-advocacy-dremio/dremio-cloud-dremioframe/blob/main/docs/getting_started/dependencies.md
Create ~/.dremio/profiles.yaml (same format as above), then:
from dremioframe.client import DremioClient
# Uses the default profile
client = DremioClient()
# Or specify a profile
client = DremioClient(profile="my_profile")
You can generate the profiles file using
dremio-cli(from the dremio-cli-skill) or create it manually.
Create a .env file in your project:
# Dremio Cloud
DREMIO_PAT=your_dremio_cloud_pat_here
DREMIO_PROJECT_ID=your_dremio_project_id_here
# DREMIO_URL=data.dremio.cloud # Optional, defaults to data.dremio.cloud
# Dremio Software (v26+)
# DREMIO_SOFTWARE_PAT=your_software_pat_here
# DREMIO_SOFTWARE_HOST=dremio.example.com
# DREMIO_SOFTWARE_PORT=32010
# DREMIO_SOFTWARE_TLS=false
# Dremio Software (v25 / username-password)
# DREMIO_SOFTWARE_USER=your_username
# DREMIO_SOFTWARE_PASSWORD=your_password
# Dremio Cloud (assumes env vars DREMIO_PAT and DREMIO_PROJECT_ID are set)
client = DremioClient()
# Dremio Software v26+
client = DremioClient(
hostname="dremio.example.com",
pat="your_pat_here",
tls=True,
mode="v26"
)
# Dremio Software v25
client = DremioClient(
hostname="localhost",
username="admin",
password="password123",
tls=False,
mode="v25"
)
from dremioframe.client import DremioClient
client = DremioClient()
# Fluent builder pattern
df = (client.table('finance.bronze.transactions')
.select("transaction_id", "amount", "customer_id")
.filter("amount > 1000")
.limit(5)
.collect())
# Raw SQL
df = client.query("SELECT * FROM finance.silver.customers")
# Aggregation
(client.table('finance.bronze.transactions')
.group_by("customer_id")
.agg(total_spent="SUM(amount)")
.show())
# Joins
customers = client.table('finance.silver.customers')
(client.table('finance.bronze.transactions')
.join(customers, on="transactions.customer_id = customers.customer_id")
.show())
# Calculated columns
df.mutate(amount_with_tax="amount * 1.08").show()
# Iceberg Time Travel
df.at_snapshot("123456789").show()
from dremioframe import F
(client.table("finance.silver.sales")
.select(
F.col("dept"),
F.sum("amount").alias("total_sales"),
F.rank().over(F.Window.order_by("amount")).alias("rank")
)
.show())
# API Ingestion
client.ingest_api(
url="https://api.example.com/users",
table_name="finance.bronze.users",
mode="merge",
pk="id"
)
# Insert from Pandas DataFrame
import pandas as pd
data = pd.DataFrame({"id": [1, 2], "name": ["A", "B"]})
client.table("finance.bronze.raw_data").insert(
"finance.bronze.raw_data", data=data, batch_size=1000
)
# Merge (Upsert)
client.table("finance.silver.customers").merge(
target_table="finance.silver.customers",
on="customer_id",
matched_update={"name": "source.name", "updated_at": "source.updated_at"},
not_matched_insert={"customer_id": "source.customer_id", "name": "source.name"},
data=data
)
df.to_csv("transactions.csv")
df.to_parquet("transactions.parquet")
(client.table('finance.gold.sales_summary')
.chart(kind="bar", x="category", y="total_sales", save_to="sales.png"))
# List catalog
print(client.catalog.list_catalog())
# Reflections
client.admin.create_reflection(
dataset_id="...", name="my_ref", type="RAW", display_fields=["col1"]
)
# Explain query
print(df.explain())
df.quality.expect_not_null("customer_id")
df.quality.expect_row_count("amount > 10000", 5, "ge")
When you need more detail on a specific feature, read the relevant doc page from the repos below.
| Topic | URL |
|---|---|
| Full Documentation | https://github.com/developer-advocacy-dremio/dremio_simple_query/blob/main/docs/dremio_simple_query_docs.md |
dremio-simple-querydremioframe~/.dremio/profiles.yaml or env vars set up. Guide them through configuration.pip install dremio-simple-query or pip install dremioframe.~/.dremio/profiles.yaml format — configure once, use everywhere.dremio-simple-query, always use the V2 client (dremio_simple_query.connectv2), not the legacy V1.grpc+tls://data.dremio.cloud:443.32010 (e.g., grpc+tls://dremio.company.com:32010).mode parameter matters: use "v26" for Software v26+ with PAT, "v25" for older username/password auth.