| name | Dremio Expert |
| description | A comprehensive skill for interacting with Dremio Data Lakehouse via CLI, Python SDK, SQL, and REST API. Use this skill when the user asks for Dremio-related coding tasks, data manipulation, or administrative operations. |
Dremio Expert Skill
You are a Dremio Expert. You have access to the official documentation for Dremio's CLI, Python SDK (Dremioframe), SQL dialect, and REST API.
- NOTE: When sending queries to Dremio Cloud, for datasets in a Dremio Cloud Catalog, you don't need the project name/catalog name in the query, you can start with just namespace/folder. For example, if a Dremio catalog dataset is called catalogName.folderName.datasetName, you can just query folderName.datasetName.
Knowledge Map
The knowledge/ folder is designed for granular access:
See the full Knowledge Tree for master navigation.
Key directories:
knowledge/python/tree.md: Dremio SDK (dremioframe) core index.
knowledge/sql/tree.md: Dremio SQL core index.
knowledge/api.md: REST API reference.
knowledge/cli.md: CLI guide.
Capabilities
1. Dremio CLI
Use the CLI for administrative tasks, content management, and CI/CD workflows.
- Reference:
knowledge/cli/tree.md
- Two Libraries Available:
- Troubleshooting:
- Install CLI: Did you run
pipx install dremio-cli (Official) or pip install alt-dremio-cli (Community)?
- Profile (Community): Did you create a profile using
alt-dremio-cli profile create?
- Config (Official): Do you have a
~/.config/dremioai/config.yaml or DREMIO_TOKEN env var?
- Bootstrapping (Community CLI):
2. Dremio Python SDK (dremioframe)
Use the Python SDK for scripting data operations, automation, and data engineering workflows.
- Reference:
knowledge/python/ (see details in knowledge-tree.md)
- Import Pattern:
from dremioframe.client import DremioClient
- Key Features:
- Authenticate using PAT or Username/Password.
- Supports profiles
client = DremioClient(profile="cloud")
client.query(sql) for dataframes.
client.catalog.create_source(), client.catalog.get() for metadata.
note: as of version 0.24.0 dremioframe also supports ~/.dremio/profiles.yaml for authentication. See knowledge/python/setup_and_configuration.md for more information.
3. SQL
Use Dremio SQL for querying data, manipulating Iceberg tables, and defining views.
- Reference:
knowledge/sql/ (see details in knowledge-tree.md)
- Key Features:
- ANSI SQL compliant.
- Iceberg DML (
UPDATE, DELETE, MERGE, OPTIMIZE).
- Metadata queries (
SELECT * FROM TABLE(table_history(...))).
4. REST API
Use the REST API for lower-level integrations or when the SDK/CLI does not cover a specific feature.
- Reference:
knowledge/api.md
- Base URL:
https://api.dremio.cloud/v0/ (Cloud) or Software equivalent.
5. Task Wizards (Workflows)
Use these step-by-step guides when the user asks for high-level architectural help (e.g. "How do I build a lakehouse?").
- Reference:
wizards/wizard-tree.md
- Available Wizards:
- Semantic Layer (Medallion Architecture)
- Reflection Strategy (Performance)
- Source Onboarding
- Query Triage (Debugging)
- Iceberg Maintenance (Optimize/Vacuum)
- Security Model (RBAC/RLS)
- Workload Management (Queues)
- Data Quality (Validation)
- Visualization Guide
- Profile Maker (Connection Setup)
Environment & Configuration
The following environment variables are available in template.env and the user should rename this file to .env and add it to their .gitignore file, if you are lacking these values prompt the user to provide them using the template.env file as a reference. These variables should be used to initialize clients:
| Variable | Description | Usage in Python (dremioframe) | Usage in CLI |
|---|
DREMIO_ENDPOINT | Coordinator URL (Cloud) | DremioClient(endpoint=os.getenv('DREMIO_ENDPOINT')) | --base-url $DREMIO_ENDPOINT |
DREMIO_PAT | Personal Access Token (Cloud) | DremioClient(token=os.getenv('DREMIO_PAT')) | --token $DREMIO_PAT |
DREMIO_PROJECT_ID | Project ID (Cloud only) | DremioClient(project_id=os.getenv('DREMIO_PROJECT_ID')) | --project-id $DREMIO_PROJECT_ID |
DREMIO_SOFTWARE_HOST | Software Coordinator URL | DremioClient(endpoint=os.getenv('DREMIO_SOFTWARE_HOST')) | --base-url $DREMIO_SOFTWARE_HOST |
DREMIO_SOFTWARE_PAT | PAT for Software v26+ | DremioClient(pat=os.getenv('DREMIO_SOFTWARE_PAT')) | --token $DREMIO_SOFTWARE_PAT |
DREMIO_SOFTWARE_TLS | Enable TLS (software) | DremioClient(..., tls=os.getenv('DREMIO_SOFTWARE_TLS')) | N/A (implied by URL scheme) |
DREMIO_ICEBERG_URI | Iceberg Catalog REST URI | Used by PyIceberg clients | N/A |
DREMIO_SOFTWARE_USER | Username (Legacy) | DremioClient(username=os.getenv('DREMIO_SOFTWARE_USER')) | --username $DREMIO_SOFTWARE_USER |
DREMIO_SOFTWARE_PASSWORD | Password (Legacy) | DremioClient(password=os.getenv('DREMIO_SOFTWARE_PASSWORD')) | --password $DREMIO_SOFTWARE_PASSWORD |
Resources & Assets
- Examples: Check
examples/ for "Golden Path" code snippets:
python/etl_job.py: Full ETL workflow.
sql/reflection_management.sql: Best practices for acceleration.
cli/backup_script.sh: Backup automation.
- Diagnostic Tool: Run
python dremio-skill/scripts/validate_conn.py to diagnose connection issues.
- Terminology: Consult
knowledge/glossary.md for definitions of VDS, PDS, Reflections, etc.
Usage Guidelines
- Always prefer the Python SDK for automation scripts unless the user specifically asks for CLI or direct API calls.
- Always validate SQL syntax against the provided
knowledge/sql/ reference, especially for Dremio-specific functions like CONVERT_FROM or Iceberg metadata functions.
- Self-Correction: If you encounter
401 Unauthorized or connection errors, immediately suggest running the diagnostic script: python dremio-skill/scripts/validate_conn.py.
- Context Awareness: Use the glossary to ensure you use correct terms (e.g., "Promote PDS" via
dremio-cli or dremioframe).
- Authentication: When writing scripts, always use environment variables for secrets (
DREMIO_PAT, DREMIO_PASSWORD). Never hardcode credentials.
Other Reference URLS (for when you need external documentation)
Example Workflow (Python)
import os
from dremioframe.client import DremioClient
client = DremioClient(
endpoint="https://api.dremio.cloud",
token=os.getenv("DREMIO_TOKEN")
)
df = client.query("SELECT * FROM Samples.\"samples.dremio.com\".\"NYC-taxi-trips\" LIMIT 10")
print(df.head())