with one click
dremio-connector-guide
// Guides an AI agent through adding data sources to Dremio by asking the user the right questions, recommending connection settings, and linking to the exact documentation for each connector.
// Guides an AI agent through adding data sources to Dremio by asking the user the right questions, recommending connection settings, and linking to the exact documentation for each connector.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | Dremio Connector Guide |
| description | Guides an AI agent through adding data sources to Dremio by asking the user the right questions, recommending connection settings, and linking to the exact documentation for each connector. |
This skill helps you guide a user through adding a data source to Dremio — whether they're using Dremio Software or Dremio Cloud. Your job is to:
When a user wants to add a source, follow this decision tree by asking questions one step at a time.
"Are you using Dremio Software (self-hosted) or Dremio Cloud?"
This matters because:
"What type of data source do you want to connect? Choose a category:
- Lakehouse Catalog — AWS Glue, Nessie, Iceberg REST, Snowflake Open Catalog, Unity Catalog, OneLake, Hive
- Object Storage — Amazon S3, Azure Storage, Google Cloud Storage, HDFS, NAS
- Relational Database — PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, BigQuery, Redshift, etc.
- Other — Dremio Cluster, Arctic, Elasticsearch, MongoDB, OpenSearch"
Once you know the source type, ask the user for the required connection parameters. Use the tables below to know what to ask. Always read the connector's documentation page (from the index below) for the full list of configuration options.
| Source | Key Questions |
|---|---|
| AWS Glue | AWS region? AWS Access Key ID and Secret (or IAM role)? Default S3 bucket path? |
| Nessie | Nessie server URL? Auth type (Bearer token or None)? Storage backend (S3, ADLS, GCS)? Bucket/container name? Storage credentials? |
| Iceberg REST Catalog | Catalog URI? Auth type (Bearer, OAuth2, None)? Warehouse location? Storage credentials? |
| Snowflake Open Catalog | Account URL? Client ID and Secret? Warehouse? |
| Unity Catalog | Databricks workspace URL? Personal access token? Catalog name? Storage credentials for underlying S3/ADLS? |
| Hive | Hive Metastore host and port? HDFS or S3 storage backend? Kerberos auth? |
| OneLake | Azure Tenant ID? Client ID and Secret? Lakehouse name? |
| Open Catalog (External) | Catalog URI? OAuth client ID and Secret? Warehouse? |
| Source | Key Questions |
|---|---|
| Amazon S3 | Bucket name? AWS region? Auth method (Access Key, IAM Role, EC2 Metadata, None)? If Access Key: AWS Access Key ID and Secret? Root path (optional)? Enable Iceberg support? |
| Azure Storage | Container name? Account name? Auth method (Access Key, Shared Access Signature, Azure AD)? Credentials based on method? |
| Google Cloud Storage | Bucket name? Project ID? Service account JSON key file? Root path? |
| HDFS (Software only) | NameNode host and port? Impersonation enabled? Kerberos auth? Root path? |
| NAS (Software only) | Mount path on Dremio server? |
For all database connectors, always ask:
| Source | Default Port | Extra Questions |
|---|---|---|
| PostgreSQL | 5432 | Schema name? |
| MySQL | 3306 | — |
| Microsoft SQL Server | 1433 | Windows auth or SQL auth? Instance name? |
| Oracle | 1521 | SID or Service Name? |
| Snowflake | 443 | Account identifier? Warehouse? Database? Schema? |
| Google BigQuery | — | Project ID? Service account JSON key? Dataset? |
| Amazon Redshift | 5439 | Cluster identifier or Serverless workgroup? |
| IBM Db2 | 50000 | — |
| SAP HANA | 30015 | Instance number? |
| Azure Synapse | 1433 | Server name? Database? |
| Apache Druid | 8888 | Broker host? |
| Teradata (Software only) | 1025 | — |
| Vertica | 5433 | — |
| MongoDB (Software only) | 27017 | Authentication database? Replica set? |
| Elasticsearch (Software only) | 9200 | Cluster name? Use scripts? |
| OpenSearch (Software only) | 9200 | AWS region (for managed)? |
| Azure Data Explorer (Software only) | — | Cluster URL? Application ID? |
| Source | Key Questions |
|---|---|
| Dremio Cluster | Dremio host URL? Auth token or username/password? |
| Arctic (Cloud only) | Arctic catalog name? (Created in Dremio Cloud UI) |
When you need the exact configuration parameters, required fields, or troubleshooting steps for a specific connector, read the relevant documentation page using your URL-reading tools.
| Topic | URL |
|---|---|
| Manage Sources (Overview) | https://docs.dremio.com/current/sonar/data-sources/ |
| Topic | URL |
|---|---|
| Open Catalog | https://docs.dremio.com/current/data-sources/open-catalog/ |
| Topic | URL |
|---|---|
| Formatting Data to a Table | https://docs.dremio.com/current/developer/data-formats/table/ |
| Upload Files | https://docs.dremio.com/current/data-sources/file-upload |
| External Queries | https://docs.dremio.com/current/help-support/advanced-topics/external-queries/ |
| Runtime Filtering | https://docs.dremio.com/current/help-support/advanced-topics/runtime-filtering/ |
| Topic | URL |
|---|---|
| Connecting to Your Data (Overview) | https://docs.dremio.com/cloud/sonar/data-sources/ |
| Source | URL |
|---|---|
| Arctic Catalog | https://docs.dremio.com/cloud/sonar/data-sources/arctic/ |
| Source | URL |
|---|---|
| AWS Glue Catalog | https://docs.dremio.com/cloud/sonar/data-sources/metastores/aws-glue |
| Snowflake Open Catalog | https://docs.dremio.com/cloud/sonar/data-sources/metastores/snowflake-open |
| Unity Catalog | https://docs.dremio.com/cloud/sonar/data-sources/metastores/unity |
| Source | URL |
|---|---|
| Amazon S3 | https://docs.dremio.com/cloud/sonar/data-sources/amazon-s3 |
| Azure Storage | https://docs.dremio.com/cloud/sonar/data-sources/azure-storage |
| Source | URL |
|---|---|
| Dremio Enterprise Cluster | https://docs.dremio.com/cloud/sonar/data-sources/dremio |
When a connection fails, walk the user through this checklist:
"Can the Dremio instance reach the data source host?"
telnet <host> <port> or nc -zv <host> <port> from the Dremio server."What error message are you seeing?"
Common patterns:
"Let me verify the connection parameters step by step."
Walk through each parameter:
https://...| Issue | Resolution |
|---|---|
| S3 "Access Denied" | Check IAM permissions: s3:GetObject, s3:ListBucket, s3:GetBucketLocation. For Iceberg, also need s3:PutObject, s3:DeleteObject. |
| Azure "AuthorizationFailed" | Verify storage account access keys, SAS token expiry, or Azure AD app registration. |
| GCS "Forbidden" | Verify service account has storage.objects.get and storage.objects.list permissions. |
| Snowflake "Warehouse suspended" | The Snowflake warehouse may need to be resumed, or auto-suspend is too aggressive. |
| BigQuery "Not found" | Verify the project ID and dataset name are correct. The service account needs bigquery.datasets.get at minimum. |
| PostgreSQL/MySQL "Connection refused" | Check pg_hba.conf (Postgres) or bind-address (MySQL) allows connections from the Dremio server's IP. |
| Oracle "ORA-12541: TNS: no listener" | Wrong host, port, or the Oracle listener service is not running. |
| MongoDB "Authentication failed" | Check the authSource database (usually admin). Ensure the user has readAnyDatabase role. |
| Metadata refresh very slow | Large catalogs with many tables can cause slow refreshes. Use metadata filters, limit schemas, or increase refresh intervals. |
For Cloud-specific connectivity issues, refer to:
Ask: "Is your data source publicly accessible, or is it in a private VPC/subnet? If private, have you set up VPC peering or PrivateLink?"
Some connectors are only available in Dremio Software:
| Connector | Available In |
|---|---|
| HDFS | Software only |
| NAS | Software only |
| MongoDB | Software only |
| Elasticsearch | Software only |
| Amazon OpenSearch | Software only |
| Azure Data Explorer | Software only |
| Teradata | Software only |
| Hive Metastore | Software only |
| Nessie | Software only |
If a user on Dremio Cloud asks for one of these, let them know it's not available in Cloud and suggest alternatives (e.g., "For MongoDB, consider using a federated database approach or data replication to an S3-based Iceberg table").
Sources can also be added programmatically via the Dremio REST API. See the Source API docs:
| Edition | URL |
|---|---|
| Software | https://docs.dremio.com/current/reference/api/source/ |
| Cloud | https://docs.dremio.com/cloud/reference/api/source/ |
Read the relevant API doc to get the JSON payload structure for creating a source via POST /api/v3/source.
LATIN1_GENERAL_BIN2 for consistent pushdown results.