| name | databricks-isv-python-sql-connector |
| description | PWAF-compliant Python SQL Connector (databricks-sql-connector): PAT, OAuth M2M, OAuth U2M (custom OAuth app PKCE + token-env), credentials_provider patterns, error handling, retry logic. Use when building Python integrations that run SQL queries via a Databricks SQL warehouse. |
Python SQL Connector Authentication (ISV)
Use this skill when implementing or testing Python SQL Connector (databricks-sql-connector) integrations for PWAF-compliant SQL query execution via a Databricks SQL warehouse.
PWAF Documentation Links
Requirements
- Package:
databricks-sql-connector 2.9+
- Python: 3.8+
- SQL Warehouse: Required (
DATABRICKS_HTTP_PATH)
- Install:
pip install databricks-sql-connector databricks-sdk
Authentication Decision Guide
┌─────────────────────────────────────────────────────────────┐
│ Which auth method should I use? │
└─────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ Is this a production/ │
│ automated workload? │
└───────────────────────────────┘
│ │
Yes No
│ │
▼ ▼
┌────────────────┐ ┌────────────────────────┐
│ OAuth M2M │ │ Is user interaction │
│ (Recommended) │ │ available? │
└────────────────┘ └────────────────────────┘
│ │
Yes No
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ OAuth U2M │ │ U2M Token-Env │
│ (Custom OAuth) │ │ (pre-obtained) │
└─────────────────┘ └─────────────────┘
Authentication Comparison
| Method | PWAF Status | Auto Token Refresh | Browser Required | Use Case |
|---|
| PAT | ⚠️ Limited | No | No | Testing only |
| OAuth M2M | ✅ Recommended | Yes (via SDK Config) | No | Production/automated |
| U2M Custom OAuth | ✅ Recommended (ISV) | No | Yes | Interactive (custom app) |
| U2M Token-Env | ✅ Supported | No | No | Headless/CI |
ISV Note: For user-interactive flows, use U2M Custom OAuth with your registered OAuth app (Account Console → App connections). This provides custom branding, audit trails, and scoped permissions.
CLIENT_ID Distinction (CRITICAL)
| Variable | Purpose | Where to Find |
|---|
DATABRICKS_CLIENT_ID | Service Principal UUID for M2M | Workspace → Settings → Identity and access → Service principals |
DATABRICKS_U2M_CLIENT_ID | Custom OAuth App client ID for U2M | Account Console → Settings → App connections |
Never mix these - they serve different authentication flows and will cause auth failures if confused.
Token Lifetime Summary
| Auth Type | Access Token | Refresh Token | Auto-Refresh |
|---|
| PAT | Configurable (default 90 days) | N/A | No |
| OAuth M2M | 1 hour | N/A (uses client credentials) | Yes (SDK Config) |
| OAuth U2M | 1 hour | Up to 90 days | No (manual) |
User-Agent (Required for ISV)
Format: <ISV-Name>_<Product-Name>/<Version>
from databricks import sql
conn = sql.connect(
server_hostname="myworkspace.cloud.databricks.com",
http_path="/sql/1.0/warehouses/abc123",
access_token="...",
user_agent_entry="YourCompany_YourProduct/1.0.0",
)
Note: Use user_agent_entry (no underscore) for connector versions 2.9+. The legacy _user_agent_entry (with underscore) is deprecated and will trigger a deprecation warning.
Environment Variables Reference
| Variable | Required For | Description |
|---|
DATABRICKS_HOST | All | Workspace URL (e.g., https://myworkspace.cloud.databricks.com) |
DATABRICKS_HTTP_PATH | All | SQL warehouse HTTP path (e.g., /sql/1.0/warehouses/abc123) |
DATABRICKS_TOKEN | PAT | Personal access token |
DATABRICKS_CLIENT_ID | OAuth M2M | Service principal UUID |
DATABRICKS_CLIENT_SECRET | OAuth M2M | Service principal OAuth secret |
DATABRICKS_U2M_CLIENT_ID | U2M Custom OAuth | Custom OAuth app client ID from App connections |
DATABRICKS_U2M_CLIENT_SECRET | U2M Custom OAuth (optional) | Custom OAuth app client secret |
DATABRICKS_REDIRECT_URI | U2M Custom OAuth (optional) | Custom redirect URI (default: http://localhost:8080/callback) |
DATABRICKS_ACCESS_TOKEN | U2M Token-Env | Pre-obtained OAuth access token |
APP_AUTH_TYPE | Multi-auth | Auth type selector: pat, oauth_m2m, oauth_u2m |
Important: Do not mix M2M and U2M environment variables. Use env -i for clean test environments.
Connection Parameters Reference
| Parameter | Type | Required | Description |
|---|
server_hostname | str | Yes | Workspace hostname (no https://) |
http_path | str | Yes | SQL warehouse HTTP path |
access_token | str | PAT, U2M | Token for authentication |
credentials_provider | callable | OAuth M2M | Callable that returns callable that returns headers |
user_agent_entry | str | Recommended | ISV partner telemetry string |
Host Normalization Helper
The connector's server_hostname expects a bare hostname (no https://):
def server_hostname(host: str) -> str:
"""Extract bare hostname from workspace URL."""
host = host.replace("https://", "").replace("http://", "")
if "/" in host:
host = host.split("/")[0]
return host
server = server_hostname(os.environ.get("DATABRICKS_HOST", ""))
Complete Examples
PAT Authentication (Testing Only)
import os
from databricks import sql
def connect_with_pat():
host = os.environ.get("DATABRICKS_HOST", "").strip()
http_path = os.environ.get("DATABRICKS_HTTP_PATH", "").strip()
token = os.environ.get("DATABRICKS_TOKEN", "").strip()
server = host.replace("https://", "").split("/")[0]
conn = sql.connect(
server_hostname=server,
http_path=http_path,
access_token=token,
user_agent_entry="YourCompany_YourProduct/1.0.0",
)
cursor = conn.cursor()
cursor.execute("SELECT current_user(), current_catalog()")
print(cursor.fetchall())
cursor.close()
conn.close()
Env vars: DATABRICKS_HOST, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN
OAuth M2M Authentication (Production Recommended)
Uses SDK Config with auth_type="oauth-m2m" for automatic token management:
import os
from databricks import sql
from databricks.sdk.config import Config
def connect_with_oauth_m2m():
host = os.environ.get("DATABRICKS_HOST", "").strip()
http_path = os.environ.get("DATABRICKS_HTTP_PATH", "").strip()
host_url = host if host.startswith("https://") else f"https://{host}"
server = host_url.replace("https://", "").split("/")[0]
config = Config(
host=host_url,
client_id=os.environ.get("DATABRICKS_CLIENT_ID", "").strip(),
client_secret=os.environ.get("DATABRICKS_CLIENT_SECRET", "").strip(),
auth_type="oauth-m2m",
)
def credentials_provider():
return lambda: config.authenticate()
conn = sql.connect(
server_hostname=server,
http_path=http_path,
credentials_provider=credentials_provider,
user_agent_entry="YourCompany_YourProduct/1.0.0",
)
cursor = conn.cursor()
cursor.execute("SELECT current_user(), current_catalog()")
print(cursor.fetchall())
cursor.close()
conn.close()
Env vars: DATABRICKS_HOST, DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET
Setup: Create service principal in Account Console → Settings → Service principals. Generate OAuth secret. Grant CAN_USE on the SQL warehouse.
OAuth U2M Custom OAuth App (Interactive)
Uses custom OAuth app with PKCE for user-interactive flows:
import os
import base64
import hashlib
import secrets
import string
import urllib.parse
import webbrowser
from http.server import HTTPServer, BaseHTTPRequestHandler
from threading import Thread
from urllib.parse import parse_qs, urlparse
import requests
from databricks import sql
DEFAULT_REDIRECT_URI = "http://localhost:8080/callback"
def pkce_verifier_and_challenge():
"""Generate PKCE code_verifier and code_challenge per RFC 7636."""
allowed = string.ascii_letters + string.digits + "-._~"
code_verifier = "".join(secrets.choice(allowed) for _ in range(64))
digest = hashlib.sha256(code_verifier.encode()).digest()
code_challenge = base64.urlsafe_b64encode(digest).decode().rstrip("=")
return code_verifier, code_challenge
def run_u2m_flow(host, client_id, redirect_uri=None, client_secret=None, scope="all-apis"):
"""Run OAuth 2.0 authorization code flow with PKCE."""
redirect_uri = redirect_uri or DEFAULT_REDIRECT_URI
host = host.rstrip("/")
if not host.startswith("https://"):
host = "https://" + host
code_verifier, code_challenge = pkce_verifier_and_challenge()
state = secrets.token_urlsafe(32)
auth_url = f"{host}/oidc/v1/authorize?" + urllib.parse.urlencode({
"response_type": "code",
"client_id": client_id,
"redirect_uri": redirect_uri,
"scope": scope,
"code_challenge": code_challenge,
"code_challenge_method": "S256",
"state": state,
})
code_holder = []
parsed = urlparse(redirect_uri)
port = parsed.port or 8080
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
q = parse_qs(urlparse(self.path).query)
code_holder.append(q.get("code", [None])[0])
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(b"<html><body><p>Authentication successful. Close this window.</p></body></html>")
def log_message(self, *args): pass
server = HTTPServer(("localhost", port), Handler)
Thread(target=server.handle_request, daemon=True).start()
webbrowser.open(auth_url)
server.handle_request()
if not code_holder[0]:
raise RuntimeError("No authorization code received")
token_url = f"{host}/oidc/v1/token"
data = {
"grant_type": "authorization_code",
"code": code_holder[0],
"redirect_uri": redirect_uri,
"client_id": client_id,
"code_verifier": code_verifier,
}
if client_secret:
data["client_secret"] = client_secret
resp = requests.post(token_url, data=data)
resp.raise_for_status()
return resp.json()
def connect_with_u2m():
host = os.environ.get("DATABRICKS_HOST", "").strip()
http_path = os.environ.get("DATABRICKS_HTTP_PATH", "").strip()
client_id = os.environ.get("DATABRICKS_U2M_CLIENT_ID", "").strip()
client_secret = os.environ.get("DATABRICKS_U2M_CLIENT_SECRET", "").strip() or None
redirect_uri = os.environ.get("DATABRICKS_REDIRECT_URI", "").strip() or DEFAULT_REDIRECT_URI
tokens = run_u2m_flow(host, client_id, redirect_uri, client_secret)
access_token = tokens.get("access_token")
server = host.replace("https://", "").split("/")[0]
conn = sql.connect(
server_hostname=server,
http_path=http_path,
access_token=access_token,
user_agent_entry="YourCompany_YourProduct/1.0.0",
)
cursor = conn.cursor()
cursor.execute("SELECT current_user(), current_catalog()")
print(cursor.fetchall())
cursor.close()
conn.close()
Env vars: DATABRICKS_HOST, DATABRICKS_HTTP_PATH, DATABRICKS_U2M_CLIENT_ID, optional DATABRICKS_U2M_CLIENT_SECRET, DATABRICKS_REDIRECT_URI
Setup: Create OAuth app in Account Console → Settings → App connections. Add redirect URI http://localhost:8080/callback.
OAuth U2M Token-Env (Headless/CI)
Pass a pre-obtained OAuth access token:
import os
from databricks import sql
def connect_with_token_env():
host = os.environ.get("DATABRICKS_HOST", "").strip()
http_path = os.environ.get("DATABRICKS_HTTP_PATH", "").strip()
access_token = os.environ.get("DATABRICKS_ACCESS_TOKEN", "").strip()
server = host.replace("https://", "").split("/")[0]
conn = sql.connect(
server_hostname=server,
http_path=http_path,
access_token=access_token,
user_agent_entry="YourCompany_YourProduct/1.0.0",
)
cursor = conn.cursor()
cursor.execute("SELECT current_user(), current_catalog()")
print(cursor.fetchall())
cursor.close()
conn.close()
Env vars: DATABRICKS_HOST, DATABRICKS_HTTP_PATH, DATABRICKS_ACCESS_TOKEN
Error Handling Patterns
Error Classification
def classify_error(error: Exception) -> tuple[str, bool]:
"""Classify error into category and retryable flag."""
error_str = str(error).lower()
if "401" in error_str or "unauthorized" in error_str:
return "authentication", False
if "invalid_client" in error_str or "invalid_grant" in error_str:
return "authentication", False
if "403" in error_str or "permission_denied" in error_str:
return "permission", False
if "analysisexception" in error_str or "parseexception" in error_str:
return "syntax", False
if "table_or_view_not_found" in error_str or "catalog_not_found" in error_str:
return "not_found", False
if "429" in error_str or "rate limit" in error_str:
return "rate_limit", True
if "503" in error_str or "temporarily_unavailable" in error_str:
return "warehouse", True
if "connection" in error_str and ("reset" in error_str or "refused" in error_str):
return "network", True
return "unknown", False
Retry with Exponential Backoff
import time
import random
def retry_query(cursor, query: str, max_retries: int = 3, initial_backoff: float = 2.0):
"""Execute query with exponential backoff retry."""
last_error = None
backoff = initial_backoff
for attempt in range(max_retries + 1):
try:
cursor.execute(query)
return cursor.fetchall()
except Exception as e:
last_error = e
category, retryable = classify_error(e)
if not retryable or attempt >= max_retries:
raise
jitter = backoff * 0.25 * random.random()
sleep_time = backoff + jitter
print(f"[Retry {attempt + 1}/{max_retries}] Waiting {sleep_time:.1f}s...")
time.sleep(sleep_time)
backoff = min(backoff * 2, 60.0)
raise last_error
Multi-Auth Pattern
Support multiple auth types in a single script using APP_AUTH_TYPE:
import os
from databricks import sql
from databricks.sdk.config import Config
def get_connection():
auth_type = os.environ.get("APP_AUTH_TYPE", "oauth_m2m").lower()
host = os.environ.get("DATABRICKS_HOST", "").strip()
http_path = os.environ.get("DATABRICKS_HTTP_PATH", "").strip()
server = host.replace("https://", "").split("/")[0]
if auth_type == "pat":
return sql.connect(
server_hostname=server,
http_path=http_path,
access_token=os.environ.get("DATABRICKS_TOKEN", ""),
user_agent_entry="YourCompany_YourProduct/1.0.0",
)
elif auth_type in ("oauth_m2m", "m2m"):
host_url = host if host.startswith("https://") else f"https://{host}"
config = Config(
host=host_url,
client_id=os.environ.get("DATABRICKS_CLIENT_ID", ""),
client_secret=os.environ.get("DATABRICKS_CLIENT_SECRET", ""),
auth_type="oauth-m2m",
)
def credentials_provider():
return lambda: config.authenticate()
return sql.connect(
server_hostname=server,
http_path=http_path,
credentials_provider=credentials_provider,
user_agent_entry="YourCompany_YourProduct/1.0.0",
)
elif auth_type in ("oauth_u2m", "u2m"):
tokens = run_u2m_flow(
host=host,
client_id=os.environ.get("DATABRICKS_U2M_CLIENT_ID", ""),
client_secret=os.environ.get("DATABRICKS_U2M_CLIENT_SECRET", "") or None,
)
return sql.connect(
server_hostname=server,
http_path=http_path,
access_token=tokens["access_token"],
user_agent_entry="YourCompany_YourProduct/1.0.0",
)
raise ValueError(f"Unknown auth type: {auth_type}")
APP_AUTH_TYPE | Auth flow | Env vars required |
|---|
pat | Personal Access Token | DATABRICKS_TOKEN |
oauth_m2m or m2m | OAuth M2M (client credentials) | DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET |
oauth_u2m or u2m | OAuth U2M (custom OAuth app) | DATABRICKS_U2M_CLIENT_ID, optional DATABRICKS_U2M_CLIENT_SECRET |
Note: Use APP_AUTH_TYPE (not DATABRICKS_AUTH_TYPE) to avoid conflicts with SDK's internal auth_type parameter.
Auth Isolation
Run tests with a clean environment using env -i:
env -i PATH=$PATH HOME=$HOME \
DATABRICKS_HOST=$DATABRICKS_HOST \
DATABRICKS_HTTP_PATH=$DATABRICKS_HTTP_PATH \
DATABRICKS_TOKEN=$DATABRICKS_TOKEN \
python your_script.py
env -i PATH=$PATH HOME=$HOME \
DATABRICKS_HOST=$DATABRICKS_HOST \
DATABRICKS_HTTP_PATH=$DATABRICKS_HTTP_PATH \
DATABRICKS_CLIENT_ID=$DATABRICKS_CLIENT_ID \
DATABRICKS_CLIENT_SECRET=$DATABRICKS_CLIENT_SECRET \
python your_script.py
env -i PATH=$PATH HOME=$HOME USER=$USER DISPLAY=$DISPLAY \
DATABRICKS_HOST=$DATABRICKS_HOST \
DATABRICKS_HTTP_PATH=$DATABRICKS_HTTP_PATH \
DATABRICKS_U2M_CLIENT_ID=$DATABRICKS_U2M_CLIENT_ID \
python your_script.py
Redirect URI Reference
| Environment | Recommended URI | Notes |
|---|
| Local development | http://localhost:8080/callback | Register in App connections |
| Docker/container | http://host.docker.internal:8080/callback | For Docker Desktop |
| Production web app | https://your-app.com/oauth/callback | HTTPS required |
Validation Query
Verify User-Agent telemetry is being recorded correctly:
SELECT event_time, user_agent, action_name, request_params
FROM system.access.audit
WHERE event_time > current_timestamp() - INTERVAL 1 HOUR
AND lower(user_agent) LIKE '%yourcompany%'
ORDER BY event_time DESC
LIMIT 10;
Troubleshooting
credentials_provider Errors
| Error | Cause | Solution |
|---|
'dict' object is not callable | credentials_provider passed incorrectly | Use lambda: config.authenticate (callable that returns callable) |
Cannot configure default credentials | M2M Config without auth_type | Pass auth_type="oauth-m2m" to Config |
More than one authorization method | PAT and M2M env vars both set | Run with clean env (env -i) for one auth type only |
Authentication Errors
| Error | Cause | Solution |
|---|
401 Unauthorized | Invalid credentials | Check token/secret validity and expiration |
invalid_client | Wrong client ID | Verify CLIENT_ID matches registered app |
invalid_grant | Auth code expired/reused | Auth codes are single-use; restart flow |
Permission Errors
| Error | Cause | Solution |
|---|
403 PERMISSION_DENIED | Missing access | Grant CAN_USE on warehouse, USE_CATALOG/SCHEMA |
TABLE_OR_VIEW_NOT_FOUND | Missing table permission | Grant SELECT on table |
Warehouse Errors
| Error | Cause | Solution |
|---|
503 Service Unavailable | Warehouse starting | Wait and retry with backoff |
TEMPORARILY_UNAVAILABLE | Warehouse busy | Retry with exponential backoff |
U2M Browser Issues
| Symptom | Cause | Solution |
|---|
| Browser doesn't open | Missing env vars | Pass HOME, USER, DISPLAY via env -i |
| Port in use | Another process | Use different port in redirect URI |
| Redirect error | URI mismatch | Verify redirect URI matches exactly in App connections |
Implementation Learnings
-
credentials_provider signature: Must be Callable[[], Callable[[], Dict[str, str]]] - a callable that returns a callable that returns headers. Use lambda: config.authenticate pattern.
-
auth_type for M2M: Always pass auth_type="oauth-m2m" to SDK Config when using client_id/client_secret to prevent default credential resolution.
-
CLIENT_ID confusion: Use separate env vars (DATABRICKS_CLIENT_ID for M2M, DATABRICKS_U2M_CLIENT_ID for U2M) to avoid mixing up service principal UUIDs with OAuth app client IDs.
-
APP_AUTH_TYPE vs DATABRICKS_AUTH_TYPE: Use APP_AUTH_TYPE for your app's auth selector to avoid conflicts with SDK's internal auth_type parameter.
-
Host normalization: SQL connector expects bare hostname without https://. Always strip scheme before passing to server_hostname.
-
User-Agent parameter: Use user_agent_entry (no underscore) for connector versions 2.9+. The legacy _user_agent_entry is deprecated.
PKCE Flow Implementation Details
When implementing custom OAuth U2M PKCE flows, follow these patterns for reliable browser callback handling:
Use threading.Event for callback signaling:
from threading import Thread, Event
callback_received = Event()
CALLBACK_TIMEOUT = 120
def serve_until_callback():
while not callback_received.is_set():
server.handle_request()
thread = Thread(target=serve_until_callback, daemon=True)
thread.start()
Handle multiple HTTP requests (browsers send favicon, prefetch, etc.):
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
req_path = urlparse(self.path).path
if req_path == "/callback":
code = parse_qs(urlparse(self.path).query).get("code", [None])[0]
if code:
callback_received.set()
self.send_success_response()
else:
self.send_response(404)
self.end_headers()
Use output flushing for immediate feedback:
print("Waiting for browser authentication...", flush=True)
Wait with timeout:
if not callback_received.wait(timeout=CALLBACK_TIMEOUT):
raise RuntimeError(f"Authentication timed out after {CALLBACK_TIMEOUT}s")
Key insight: Using server.handle_request() only once can be consumed by browser prefetch requests (like favicon), causing the actual OAuth callback to be missed. The Event-based loop pattern ensures all requests are handled until the actual callback arrives.
Key Differences from Databricks SDK for Python
| Aspect | Python SQL Connector | Databricks SDK for Python |
|---|
| Purpose | SQL queries via warehouse | REST APIs (UC, Jobs, Clusters) |
| SQL Warehouse | Required (http_path) | Not needed |
| Interface | DBAPI 2.0 cursor | Typed Python methods |
| Auth Config | sql.connect() parameters | Config() object |
| User-Agent | user_agent_entry (per-connection, v2.9+) | useragent.with_partner() (global) |
| M2M Token Refresh | Via SDK Config's credentials_provider | Automatic |
| U2M Browser | Not built-in (use PKCE helper) | Config with auth_type="external-browser" |