con un clic
dremio-python-libraries
// Enables an AI agent to use dremio-simple-query (lightweight SQL/Arrow Flight) and DremioFrame (full dataframe builder with ingestion, modeling, and admin) to interact with Dremio from Python.
// Enables an AI agent to use dremio-simple-query (lightweight SQL/Arrow Flight) and DremioFrame (full dataframe builder with ingestion, modeling, and admin) to interact with Dremio from Python.
[HINT] Descarga el directorio completo de la habilidad incluyendo SKILL.md y todos los archivos relacionados
| name | Dremio Python Libraries |
| description | Enables an AI agent to use dremio-simple-query (lightweight SQL/Arrow Flight) and DremioFrame (full dataframe builder with ingestion, modeling, and admin) to interact with Dremio from Python. |
This skill covers two Python libraries for working with Dremio. Choose the right tool for the job:
| Need | Use |
|---|---|
| Run SQL and get results as Arrow, Pandas, Polars, or DuckDB | dremio-simple-query |
| Data ingestion, CRUD, Iceberg management, admin, modeling, charting, orchestration | DremioFrame |
A lightweight library that uses Apache Arrow Flight for high-performance SQL queries against Dremio.
pip install dremio-simple-query
Both dremio-simple-query and DremioFrame share the same profile file at ~/.dremio/profiles.yaml. Create it with the following structure:
profiles:
# Dremio Cloud with PAT
my_cloud:
type: cloud
base_url: https://api.dremio.cloud
auth:
type: pat
token: MY_PAT_TOKEN
# Software with PAT
my_software_pat:
type: software
base_url: https://dremio.company.com
auth:
type: pat
token: MY_PAT_TOKEN
# Software with Username/Password
my_software_basic:
type: software
base_url: https://dremio.company.com
auth:
type: username_password
username: my_user
password: my_password
# Software with OAuth Client Credentials
my_software_oauth:
type: software
base_url: https://dremio.company.com
auth:
type: oauth
client_id: MY_CLIENT_ID
client_secret: MY_CLIENT_SECRET
Then connect using a profile name:
from dremio_simple_query.connectv2 import DremioConnection
dremio = DremioConnection(profile="my_cloud")
from dremio_simple_query.connectv2 import DremioConnection
from os import getenv
from dotenv import load_dotenv
load_dotenv()
# Dremio Cloud — PAT auth
dremio = DremioConnection(
location=getenv("ARROW_ENDPOINT"), # e.g. grpc+tls://data.dremio.cloud:443
token=getenv("DREMIO_TOKEN"),
project_id=getenv("DREMIO_PROJECT_ID") # Optional for Cloud
)
# Dremio Software — Username/Password auth
dremio = DremioConnection(
location="grpc+tls://dremio.company.com:32010",
username="my_user",
password="my_password"
)
Agent prompt: "I need to connect to Dremio from Python. Are you using Dremio Cloud or Dremio Software? Do you have a PAT (Personal Access Token), or should we use username/password? Do you already have a
~/.dremio/profiles.yamlconfigured?"
Always use the V2 client (dremio_simple_query.connectv2).
from dremio_simple_query.connectv2 import DremioConnection
dremio = DremioConnection(profile="my_profile")
# Arrow FlightStreamReader (raw, most performant)
stream = dremio.toArrow("SELECT * FROM my_table")
arrow_table = stream.read_all() # Arrow Table
batch_reader = stream.to_reader() # RecordBatchReader
# Pandas DataFrame
df = dremio.toPandas("SELECT * FROM my_table")
# Polars DataFrame
df = dremio.toPolars("SELECT * FROM my_table")
# DuckDB Relation
duck_rel = dremio.toDuckDB("SELECT * FROM my_table")
result = duck_rel.query("my_table", "SELECT * FROM my_table").fetchall()
import duckdb
stream = dremio.toArrow("SELECT * FROM my_table")
my_table = stream.read_all()
con = duckdb.connection()
results = con.execute("SELECT * FROM my_table").fetchall()
A comprehensive Python library providing a dataframe builder interface for Dremio with CRUD, ingestion, admin, modeling, charting, orchestration, and AI features. Currently in alpha.
pip install dremioframe
# With optional dependencies (e.g., for chart image export)
pip install "dremioframe[image_export]"
Optional dependency groups are documented at: https://github.com/developer-advocacy-dremio/dremio-cloud-dremioframe/blob/main/docs/getting_started/dependencies.md
Create ~/.dremio/profiles.yaml (same format as above), then:
from dremioframe.client import DremioClient
# Uses the default profile
client = DremioClient()
# Or specify a profile
client = DremioClient(profile="my_profile")
You can generate the profiles file using
dremio-cli(from the dremio-cli-skill) or create it manually.
Create a .env file in your project:
# Dremio Cloud
DREMIO_PAT=your_dremio_cloud_pat_here
DREMIO_PROJECT_ID=your_dremio_project_id_here
# DREMIO_URL=data.dremio.cloud # Optional, defaults to data.dremio.cloud
# Dremio Software (v26+)
# DREMIO_SOFTWARE_PAT=your_software_pat_here
# DREMIO_SOFTWARE_HOST=dremio.example.com
# DREMIO_SOFTWARE_PORT=32010
# DREMIO_SOFTWARE_TLS=false
# Dremio Software (v25 / username-password)
# DREMIO_SOFTWARE_USER=your_username
# DREMIO_SOFTWARE_PASSWORD=your_password
# Dremio Cloud (assumes env vars DREMIO_PAT and DREMIO_PROJECT_ID are set)
client = DremioClient()
# Dremio Software v26+
client = DremioClient(
hostname="dremio.example.com",
pat="your_pat_here",
tls=True,
mode="v26"
)
# Dremio Software v25
client = DremioClient(
hostname="localhost",
username="admin",
password="password123",
tls=False,
mode="v25"
)
from dremioframe.client import DremioClient
client = DremioClient()
# Fluent builder pattern
df = (client.table('finance.bronze.transactions')
.select("transaction_id", "amount", "customer_id")
.filter("amount > 1000")
.limit(5)
.collect())
# Raw SQL
df = client.query("SELECT * FROM finance.silver.customers")
# Aggregation
(client.table('finance.bronze.transactions')
.group_by("customer_id")
.agg(total_spent="SUM(amount)")
.show())
# Joins
customers = client.table('finance.silver.customers')
(client.table('finance.bronze.transactions')
.join(customers, on="transactions.customer_id = customers.customer_id")
.show())
# Calculated columns
df.mutate(amount_with_tax="amount * 1.08").show()
# Iceberg Time Travel
df.at_snapshot("123456789").show()
from dremioframe import F
(client.table("finance.silver.sales")
.select(
F.col("dept"),
F.sum("amount").alias("total_sales"),
F.rank().over(F.Window.order_by("amount")).alias("rank")
)
.show())
# API Ingestion
client.ingest_api(
url="https://api.example.com/users",
table_name="finance.bronze.users",
mode="merge",
pk="id"
)
# Insert from Pandas DataFrame
import pandas as pd
data = pd.DataFrame({"id": [1, 2], "name": ["A", "B"]})
client.table("finance.bronze.raw_data").insert(
"finance.bronze.raw_data", data=data, batch_size=1000
)
# Merge (Upsert)
client.table("finance.silver.customers").merge(
target_table="finance.silver.customers",
on="customer_id",
matched_update={"name": "source.name", "updated_at": "source.updated_at"},
not_matched_insert={"customer_id": "source.customer_id", "name": "source.name"},
data=data
)
df.to_csv("transactions.csv")
df.to_parquet("transactions.parquet")
(client.table('finance.gold.sales_summary')
.chart(kind="bar", x="category", y="total_sales", save_to="sales.png"))
# List catalog
print(client.catalog.list_catalog())
# Reflections
client.admin.create_reflection(
dataset_id="...", name="my_ref", type="RAW", display_fields=["col1"]
)
# Explain query
print(df.explain())
df.quality.expect_not_null("customer_id")
df.quality.expect_row_count("amount > 10000", 5, "ge")
When you need more detail on a specific feature, read the relevant doc page from the repos below.
| Topic | URL |
|---|---|
| Full Documentation | https://github.com/developer-advocacy-dremio/dremio_simple_query/blob/main/docs/dremio_simple_query_docs.md |
dremio-simple-querydremioframe~/.dremio/profiles.yaml or env vars set up. Guide them through configuration.pip install dremio-simple-query or pip install dremioframe.~/.dremio/profiles.yaml format — configure once, use everywhere.dremio-simple-query, always use the V2 client (dremio_simple_query.connectv2), not the legacy V1.grpc+tls://data.dremio.cloud:443.32010 (e.g., grpc+tls://dremio.company.com:32010).mode parameter matters: use "v26" for Software v26+ with PAT, "v25" for older username/password auth.