with one click
databricks-isv-connector-structure
// How to structure a Databricks connector (REST or Python SDK): config, connect, operations, validation. Use when designing or building a new connector.
// How to structure a Databricks connector (REST or Python SDK): config, connect, operations, validation. Use when designing or building a new connector.
Build PWAF-compliant ISV integrations with Databricks: OAuth, telemetry (User-Agent), Unity Catalog, JDBC, SDK, SQL drivers, REST API, Databricks Connect.
PWAF-compliant Python SQL Connector (databricks-sql-connector): PAT, OAuth M2M, OAuth U2M (custom OAuth app PKCE + token-env), credentials_provider patterns, error handling, retry logic. Use when building Python integrations that run SQL queries via a Databricks SQL warehouse.
Add a Databricks connector to an existing project that has no Databricks integration. Use when your product already exists and you want to add Databricks as a new data source or backend.
How to write a build_report.md for any PWAF connector. Captures skill traceability, sufficiency assessment, and test error/fix log. Use after implementing any connector.
How to build and use a PWAF connector test runner (tests/run_all_tests.sh). Covers env isolation, auth types, single connector or single auth mode, parallel execution, browser tests, and report generation.
PWAF-compliant Databricks Connect (Python): PAT, OAuth M2M, OAuth U2M; serverless and classic compute. Use when building or testing Spark-over-Connect integrations.
| name | databricks-isv-connector-structure |
| description | How to structure a Databricks connector (REST or Python SDK): config, connect, operations, validation. Use when designing or building a new connector. |
Use this skill when designing or building a Databricks connector (REST-based or Python SDK-based). It defines a consistent shape so auth, telemetry, and operations stay correct.
Support all three auth types: PAT, OAuth M2M, and OAuth U2M.
| Auth type | Use case | Recommended |
|---|---|---|
| PAT | Testing, development, simple automation | Yes – optional for production but required for many workflows. |
| OAuth M2M | Service-to-service, production backends, automated jobs | Yes – recommended for production. |
| OAuth U2M | Interactive user sign-in, desktop/UI apps, user context | Yes – app-implemented browser flow or token pass-through. |
Design your config and connect() so the user can choose one of these per connection. Do not mix them in a single connection (one connection = one auth type).
User-Agent is required and is coded at the connector level. The end user does not provide or configure User-Agent. The connector developer sets it in code (e.g. a constant or build-time value). Format: <isv-name>_<product-name>/<product-version> (e.g. AcmePartner_DataConnector/2.1.0). The connector sends this header on every API/driver call so usage is attributed correctly in Databricks (audit, query history). Do not expose User-Agent (or product/product_version) as a user-configurable connection parameter.
DATABRICKS_AUTH_TYPE trap (breaks Python SDK, Go SDK, Java SDK), ~/.databrickscfg DEFAULT profile leakage, and the env -i isolation pattern for tests.Use a single config object (or connection params) that includes:
| Parameter | Required | Description |
|---|---|---|
host | Yes | Workspace URL (e.g. https://myworkspace.cloud.databricks.com). End user provides. |
auth_type | Yes | One of: pat | oauth_m2m | oauth_u2m | token (pass-through). Recommend supporting all three: PAT, M2M, U2M. |
| Credentials by auth_type | Yes | For PAT: token. For M2M: client_id, client_secret. For U2M: access_token (or run browser flow once and pass token). End user provides (except when connector runs browser flow). |
warehouse_id | If using SQL | Required for Statement Execution API. End user provides – or the connector can extract it from http_path (e.g. path /sql/1.0/warehouses/abc123 → warehouse_id abc123). |
http_path | If using SQL connector | SQL Warehouse HTTP path (e.g. /sql/1.0/warehouses/abc123). End user provides. The connector can derive warehouse_id from this path if needed for the Statement Execution API. |
redirect_uri | Optional (U2M localhost) | Default e.g. http://localhost:8080/callback. End user may override. |
| Databricks Connect only | ||
serverless_compute_id | For Connect serverless | Set to "auto" (recommended when supported). End user may enable. |
cluster_id or classic_compute_http_path | For Connect classic | Cluster ID or HTTP path (e.g. sql/protocolv1/o/<workspace_id>/<cluster_id>). End user provides. One of serverless or classic per connection. |
User-Agent: Not a user parameter. The connector sets it in code (connector-level); see "User-Agent (required, connector-level)" above.
Rule: One connection = one auth type. Do not accept or mix multiple auth methods in a single config (e.g. do not set both PAT and M2M on the same connection).
All options the end user must (or may) enter for each auth type. User-Agent is not a user input – it is coded at the connector level (see "User-Agent (required, connector-level)" above). If the connector runs SQL or Statement Execution, the user may need to provide warehouse_id and/or http_path. Note: The connector can extract warehouse_id from http_path (e.g. /sql/1.0/warehouses/abc123 → abc123), so the user need not provide both if they already supply http_path.
| Option | Required | Description |
|---|---|---|
host | Yes | Databricks workspace URL (e.g. https://myworkspace.cloud.databricks.com). |
token | Yes | Personal access token (starts with dapi...). User creates in workspace: Settings → Developer → Access tokens. |
warehouse_id | If using SQL / Statement Execution | SQL Warehouse ID. Can be extracted from http_path if user provides that instead (e.g. /sql/1.0/warehouses/abc123 → abc123). |
http_path | If using SQL connector | SQL Warehouse HTTP path (e.g. /sql/1.0/warehouses/abc123). Connector can derive warehouse_id from this. |
| Option | Required | Description |
|---|---|---|
host | Yes | Databricks workspace URL. |
client_id | Yes | Service principal application (client) ID (UUID). Created in workspace: Settings → Identity and access → Service principals. |
client_secret | Yes | OAuth client secret for that service principal (generate in Service principals UI). |
warehouse_id | If using SQL / Statement Execution | SQL Warehouse ID. Can be extracted from http_path if provided. |
http_path | If using SQL connector | SQL Warehouse HTTP path. Connector can derive warehouse_id from this. |
U2M can be implemented in three ways. Collect only the options for the flow you support.
Option A – External browser (no custom OAuth app)
User signs in via browser; connector uses Databricks built-in OAuth app.
| Option | Required | Description |
|---|---|---|
host | Yes | Databricks workspace URL. |
warehouse_id | If using SQL / Statement Execution | SQL Warehouse ID. Can be extracted from http_path if provided. |
http_path | If using SQL connector | SQL Warehouse HTTP path. Connector can derive warehouse_id from this. |
Do not ask for or use client_id / client_secret for this flow (M2M service principal client_id is not valid for browser OAuth).
Option B – Localhost (custom OAuth app)
Admin creates an OAuth app in the account with a localhost redirect URI; user signs in and is redirected back to the connector.
| Option | Required | Description |
|---|---|---|
host | Yes | Databricks workspace URL. |
client_id | Yes | OAuth application client ID from the custom app (Settings → Developer → App connections). Not the M2M service principal client_id. |
redirect_uri | Optional | Redirect URI registered for the app. Default: http://localhost:8080/callback. |
client_secret | If app is confidential | OAuth app secret, if the app has one. |
warehouse_id | If using SQL / Statement Execution | SQL Warehouse ID. Can be extracted from http_path if provided. |
http_path | If using SQL connector | SQL Warehouse HTTP path. Connector can derive warehouse_id from this. |
Option C – Token pass-through (pre-obtained token)
User provides an access token already obtained (e.g. from a hosted callback or refresh).
| Option | Required | Description |
|---|---|---|
host | Yes | Databricks workspace URL. |
access_token | Yes | OAuth access token (from browser flow or refresh). |
warehouse_id | If using SQL / Statement Execution | SQL Warehouse ID. Can be extracted from http_path if provided. |
http_path | If using SQL connector | SQL Warehouse HTTP path. Connector can derive warehouse_id from this. |
Expose one way to create a connected client:
connect(config) -> Client, where the client holds the resolved token (or M2M token fetcher), host, and User-Agent. All subsequent API calls use this client so headers are consistent.connect(config) -> WorkspaceClient (or DatabricksSession for Connect). Build Config once from config with the correct auth_type (e.g. auth_type="oauth-m2m" for M2M). Set User-Agent (product/product_version) from connector-level constants, not from user config; return WorkspaceClient(config=...).Inside connect():
auth_type.token as Bearer (REST) or WorkspaceClient(host=..., token=..., product=..., product_version=...) (SDK). Use connector-level product/product_version for User-Agent, not from user config./oidc/v1/token, cache with expiry; SDK: Config(host=..., client_id=..., client_secret=..., auth_type="oauth-m2m", product=..., product_version=...)). Use connector-level product/product_version.access_token as Bearer (REST) or WorkspaceClient(host=..., token=access_token, ...) (SDK). If the connector runs the browser flow, do that once and then store the access_token (and optionally refresh_token) in config or session. User-Agent from connector level.Rule: Do not mix auth methods in one process. If the app supports multiple connection types, each connection instance should use only one auth type (see auth isolation in the main cursor rule).
All API or SDK operations go through the client returned by connect():
Authorization: Bearer <token>, User-Agent, and Content-Type where needed. For M2M, the client refreshes the token when expired (same pattern as in rest-api-authentication.md).client.tables.get(), client.jobs.list(), etc. No extra auth or User-Agent wiring per call.DatabricksSession.builder.sdkConfig(config).getOrCreate() with a single Config built from host, auth (token or client_id/client_secret with auth_type="oauth-m2m" or U2M token), and compute (serverless_compute_id="auto" or cluster_id). Set product/product_version on Config. Do not set both serverless and classic in the same Config.This keeps telemetry and auth in one place and avoids mistakes (e.g. missing User-Agent or wrong token).
To verify credentials after connect, run the two recommended validation tests:
GET /api/2.1/unity-catalog/tables/<catalog>.<schema>.<table> (no warehouse). Or with SDK: client.tables.get(full_name=...).POST /api/2.0/sql/statements with warehouse_id and a simple SQL (e.g. SELECT 1 or DESCRIBE EXTENDED catalog.schema.table AS JSON). Requires warehouse_id in config.If either fails with 401/403, the connection is invalid or missing permissions. Optionally expose this as client.validate_connection() or a standalone helper.
grant_type=refresh_token) to get a new access_token. Do not send the refresh_token on every API request.| Layer | Responsibility |
|---|---|
| User-Agent | Required, connector-level. Set on every request; format <isv>_<product>/<version>. Coded in the connector (not a user-configurable option). |
| Config | host, auth_type (recommend PAT + M2M + U2M), credentials for that type (see “User input by auth type” above), optional warehouse_id/http_path. No user-supplied User-Agent. |
| connect(config) | Single entry point; branch on auth_type; return REST client or WorkspaceClient with correct auth and User-Agent. |
| Client | All operations; consistent headers (REST) or SDK client; M2M token refresh inside client. |
| validate_connection() | Optional; run UC table get + Statement Execution to verify. |
| Token refresh | M2M: refresh before expiry; U2M: refresh_token → new access_token. Never log tokens. |
Recommendation: Support all three auth types (PAT, OAuth M2M, OAuth U2M) so users can choose the right one per connection. User-Agent is required but is set at the connector level (in code), not by the end user; the end user provides only host, credentials for the chosen auth type, and optional warehouse_id/http_path. Following this structure keeps auth isolation, telemetry, and validation aligned with the REST and Python auth rules and the integration checklist.