一键在 Manus 中运行任何 Skill

$pwd:

02-experiment-tracing-and-uc-storage

Name: 02 Experiment Tracing And Uc Storage
Author: databricks-solutions

// Use when setting up MLflow experiments, tracing, or UC OTEL trace storage for a GenAI agent. Covers structured experiment paths, tracing decorators, manual spans, tags, connection pooling, and Unity Catalog OTEL storage for SQL-queryable trace retention. Foundation Step 2. Consumes MLflow environment from Step 1.

在 Manus 中运行

$ git log --oneline --stat

stars:4

forks:4

updated:2026年4月30日 20:02

文件资源管理器

6 个文件

SKILL.md

readonly

related-skills.json

同仓库

project-planning.md

from "databricks-solutions/vibe-coding-workshop-template"

Create multi-phase project plans for Databricks data platform solutions with Agent Domain Framework and Agent Layer Architecture. Includes interactive Quick Start with key decisions, industry-specific domain patterns, complete phase document templates (Use Cases, Agents, Frontend), Genie Space integration patterns, deployment order requirements, and worked examples. Default acceleration mode plans on top of a completed Gold layer. Workshop mode can also plan from the best available layer (deployed Gold, Gold design YAML, deployed Silver, deployed Bronze, or source schema CSV) and produces a workshop-draft contract for downstream stages. Use when planning any Databricks solution after Gold layer is complete, or in workshop mode after Bronze, Silver, or Gold-design is available.

2026-05-064

semantic-layer-setup.md

from "databricks-solutions/vibe-coding-workshop-template"

End-to-end orchestrator for building the Databricks semantic layer including Metric Views, Table-Valued Functions (TVFs), and Genie Spaces. Guides users through metric view creation, TVF development, Genie Space setup, and API-driven deployment. Orchestrates mandatory dependencies on semantic-layer skills (metric-views-patterns, databricks-table-valued-functions, genie-space-patterns, genie-space-export-import-api) and common skills (databricks-asset-bundles, databricks-expert-agent, databricks-python-imports). Use when building the semantic layer end-to-end, creating Metric Views and TVFs for Genie, or setting up Genie Spaces. For Genie optimization, use genie-optimization-orchestrator directly.

2026-05-064

metric-views-patterns.md

from "databricks-solutions/vibe-coding-workshop-template"

Standard patterns for creating Databricks Metric Views with semantic metadata for Genie and AI/BI. Use when creating metric views, troubleshooting metric view creation errors, validating schema references before deployment, implementing joins (including snowflake schema patterns), or optimizing metric views for Genie natural language queries.

2026-05-064

databricks-table-valued-functions.md

from "databricks-solutions/vibe-coding-workshop-template"

End-to-end guide for planning, creating, deploying, and validating Table-Valued Functions (TVFs) in Databricks optimized for Genie Space natural language queries. Use when creating TVFs for Genie Spaces, planning TVF requirements from business questions, troubleshooting TVF compilation errors, or ensuring Genie compatibility. Includes requirements gathering templates, schema validation patterns, SQL requirements (STRING parameters, parameter ordering, LIMIT workarounds), v3.0 bullet-point comment format, null safety, SCD2 handling, cartesian product prevention, 5 complete domain-adaptable examples, Asset Bundle deployment patterns, and post-deployment validation queries.

2026-05-064

genie-space-patterns.md

from "databricks-solutions/vibe-coding-workshop-template"

Patterns for setting up Databricks Genie Spaces with comprehensive agent instructions, data assets, SQL expressions, and benchmark questions. Use when creating Genie Spaces, configuring agent behavior, selecting data assets, defining SQL expressions (measures, filters, dimensions), or validating benchmark questions. Includes mandatory 8-section deliverable structure, General Instructions (≤20 lines), data asset organization (Metric Views → TVFs → Tables), SQL expressions (sql_snippets) for structured KPI/filter/dimension definitions, benchmark questions with exact SQL, Serverless warehouse mandate, table/column comment requirements for Genie SQL quality, pre-creation table inspection, Conversation API programmatic validation, follow-up vs new conversation patterns, deployment checklists, post-deployment configuration audit for drift detection, cross-consumer design considerations (Genie + dashboards), and benchmark regression testing patterns.

2026-05-064

genie-space-export-import-api.md

from "databricks-solutions/vibe-coding-workshop-template"

Comprehensive patterns for Databricks Genie Space Export/Import API - JSON schema, serialization format, and programmatic deployment. Use when programmatically creating, exporting, or importing Genie Spaces via REST API, troubleshooting API deployment errors, or implementing CI/CD for Genie Spaces. Includes complete GenieSpaceExport schema, API endpoints (List, Get, Create, Update, Delete), JSON format requirements, ID generation, variable substitution, inventory-driven generation patterns, and production deployment checklists.

2026-05-064

package.json

"author": "databricks-solutions"

"repository": "databricks-solutions/vibe-coding-workshop-template"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

数据科学家计算机与数学类职业15-2051L4

name	02-experiment-tracing-and-uc-storage
description	Use when setting up MLflow experiments, tracing, or UC OTEL trace storage for a GenAI agent. Covers structured experiment paths, tracing decorators, manual spans, tags, connection pooling, and Unity Catalog OTEL storage for SQL-queryable trace retention. Foundation Step 2. Consumes MLflow environment from Step 1.
license	Apache-2.0
metadata	{"last_verified":"2026-04-15","volatility":"high","upstream_sources":[],"author":"prashanth-subrahmanyam","version":"3.6.0","domain":"genai-agents","pipeline_position":"F2","consumes":"mlflow_environment","produces":"experiment_paths, tracing_config, connection_pool, f2_grants_complete, otel_table_prefix, mlflow_tracing_sql_warehouse_id, app_service_principal_grants","grounded_in":"docs.databricks.com/aws/en/mlflow3/genai/tracing/trace-unity-catalog, docs.databricks.com/aws/en/mlflow3/genai, docs.databricks.com/aws/en/mlflow3/genai/tracing/app-instrumentation, docs.databricks.com/aws/en/mlflow3/genai/tracing/app-instrumentation/automatic, docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing, docs.databricks.com/aws/en/mlflow3/genai/tracing/add-context-to-traces"}

Experiment tracing setup

When to Use

Use this skill when you need to:

Organize MLflow experiments so runs are discoverable by space, domain, and lifecycle stage
Add tracing to GenAI agents (decorators, nested spans, inputs/outputs)
Configure MLflow for multi-stage pipelines (development, evaluation, deployment) with consistent paths and UC prompt registry visibility
Tune HTTP client behavior before high-throughput tracing or evaluation workloads

Prerequisite: complete Foundation Step 1 (MLflow foundation) so tracking URI and authentication are already correct. See MLflow GenAI Foundation (Foundation Step 1).

TypeScript / Node agents: this skill is the Python instrumentation reference. For the official mlflow-tracing + mlflow-openai npm path (Node-native mlflow.init, tracedOpenAI, mlflow.trace, withSpan, session grouping), see the sibling skill 02b-typescript-tracing. Use OTLP (via custom OpenTelemetry instrumentation when the TypeScript SDK does not fit) only as a fallback when you need vendor-neutral spans or already run an OpenTelemetry collector.

Production deployment: the env-var matrix for deployed agents (ENABLE_MLFLOW_TRACING, MLFLOW_EXPERIMENT_ID, SP CAN_EDIT on the experiment, the Git-folder caveat, Production Monitoring → Delta) lives in references/prod-tracing-deployment.md. Track A and Track C deployment skills link there.

User / session / environment context: the canonical reference for attributing traces to a user (mlflow.trace.user), grouping multi-turn conversations (mlflow.trace.session), and overriding mlflow.source.type from APP_ENVIRONMENT lives in 02c-trace-context-and-environments. The "Trace tags and metadata" section below shows the call-site shape; F2c is the long form (tags vs metadata, auto-populated fields, search examples, deployment overrides).

Which approach: automatic vs manual vs combined

Before writing tracing code, pick the right approach. Source: Add traces to applications (overview).

Scenario	Recommended approach
You use one GenAI library (LangChain, LlamaIndex, DSPy, …)	Automatic tracing only — `mlflow.<library>.autolog()`.
You call an LLM SDK directly (OpenAI, Anthropic, Mistral, …)	Automatic for the SDK + a thin `@mlflow.trace` wrapper around your `run()` / orchestration function so all calls roll up into one trace.
You use multiple frameworks / SDKs in one workflow	Enable `autolog()` for each framework + use `@mlflow.trace` to combine them into a single root trace.
All other scenarios (custom logic, tool routing, complex retry/fallback, framework-less)	Manual with `@mlflow.trace` decorators first; drop down to `mlflow.start_span` only when you need finer-grained control.

Start with automatic. It's the fastest way to get traces working. Add manual tracing later if you need more control. Both approaches feed the same trace tree — @mlflow.trace parent spans naturally nest auto-traced child spans.

For the full 20+ supported autolog integrations (LLM SDKs, orchestrators, agent frameworks, embedding libraries) plus the multi-framework combine pattern and the serverless-compute caveat, see references/autolog-integrations.md.

Experiment path organization

CRITICAL: consume the experiment path from state — do not invent one

The workshop pins MLflow experiment paths to the same user-and-use-case identity that backs APP_NAME (e.g. prashanth-s-stayfinder) so concurrent attendees on a shared workspace cannot collide on a single experiment, and so the leaf in the MLflow UI is never a generic word like Tracing, traces, Default, or my-agent.

The canonical derivation lives in vibecoding-state migrate_canonical and is captured in state at the prompt that first resolves $APP_NAME / $AGENT_NAME:

State field	Derivation	Example
`mlflow_experiment_path`	`/Users/<user_email>/mlflow/<APP_NAME or AGENT_NAME>-agent`	`/Users/jane.doe@example.com/mlflow/jane-d-stayfinder-agent`
`mlflow_feedback_experiment_path`	`/Users/<user_email>/mlflow/<APP_NAME>-feedback`	`/Users/jane.doe@example.com/mlflow/jane-d-stayfinder-feedback`

This skill consumes those values from state://Resources.mlflow_experiment_path rather than constructing its own. If state shows <pending> for the path, halt and route back to vibecoding-state migrate_canonical — do not paper over it with a hand-rolled /Shared/... default.

Path template (for projects that do not run on top of `vibecoding-state`)

If your project does not use the vibecoding-state skill, define a template that still pins identity onto the leaf:

EXPERIMENT_PATH_TEMPLATE = "/Users/{{ user_email }}/mlflow/{{ app_name }}-{{ stage }}"

Where app_name is the user-prefixed, use-case-suffixed identity (e.g. prashanth-s-stayfinder) and stage ∈ {agent, eval, feedback, deploy}.

Three-experiment lifecycle pattern

For multi-stage pipelines, use separate experiments (one leaf per stage under the same app_name):

Stage	Leaf	Purpose
agent / dev	`<app_name>-agent`	Interactive debugging, short runs, permissive logging — the default tracing destination
eval	`<app_name>-eval`	Benchmarks, `mlflow.genai.evaluate`, regression gates
feedback	`<app_name>-feedback`	End-user thumbs / human assessments persisted from the AppKit feedback skill
deploy	`<app_name>-deploy`	Production or promotion runs, stricter tags and retention

The leaf must always carry <app_name> so that browsing MLflow experiments lists prashanth-s-stayfinder-agent, prashanth-s-stayfinder-eval, etc. — never a bare agent / eval / Tracing.

Setting the experiment

When running inside the workshop, read the path from state:

import mlflow

# state://Resources.mlflow_experiment_path is already pinned to
# /Users/<user_email>/mlflow/<APP_NAME>-agent by vibecoding-state.migrate_canonical.
experiment_path = state["Resources"]["mlflow_experiment_path"]
mlflow.set_experiment(experiment_path)

Stand-alone projects build the path from the same identity inputs:

import mlflow

user_email = "jane.doe@example.com"
app_name = "jane-d-stayfinder"  # ${FIRSTNAME}-${LASTINITIAL}-${use_case_slug}
experiment_path = f"/Users/{user_email}/mlflow/{app_name}-agent"
mlflow.set_experiment(experiment_path)

Set the experiment early in your entrypoint — before enabling autolog and making any LLM calls. Never use a literal leaf like traces, Tracing, or my-agent; the leaf is the only thing surfacing in the MLflow UI search column and a generic value defeats per-attendee isolation.

For complete experiment organization patterns including ExperimentManager, search, cleanup, and decision tables, see: references/experiment-organization.md.

CRITICAL: Prompt registry linkage

Prompts registered in Unity Catalog must be linked to the experiment or they will not surface correctly in the Experiment UI for prompt-aware workflows.

After set_experiment, set the experiment tag:

mlflow.set_experiment_tags({
    "mlflow.promptRegistryLocation": f"{catalog}.{schema}",
})

Use your UC catalog and schema where prompts are registered. Without mlflow.promptRegistryLocation, UC-registered prompts may not appear as expected in the UI.

Tracing with decorators

Use @mlflow.trace for automatic span creation around functions. Pick a name and span_type that match how you want traces grouped in the UI.

import mlflow


@mlflow.trace(name="classify_intent", span_type="AGENT")
def classify_intent(query: str) -> dict:
    ...


@mlflow.trace(name="call_llm", span_type="LLM")
def call_llm(prompt: str) -> str:
    ...


@mlflow.trace(name="evaluate_response", span_type="JUDGE")
def evaluate_response(response: str) -> float:
    ...

Common span_type values: AGENT, TOOL, LLM, RETRIEVER, JUDGE, EMBEDDING. Align names with your team's conventions so traces stay searchable across services.

For complete decorator and async tracing examples, see: references/tracing-patterns.md.

For the 20+ mlflow.<library>.autolog() integrations (OpenAI, Anthropic, Mistral, LangChain, LangGraph, LlamaIndex, DSPy, LiteLLM, etc.), the multi-framework combine snippet, and the serverless-compute caveat (autolog is not auto-enabled), see references/autolog-integrations.md.

Manual span creation

For fine-grained control (nested work units, partial inputs/outputs, retries), use mlflow.start_span. This pattern matches how the optimizer wraps LLM calls.

For complex tracing, open a span with span_type=SpanType.CHAIN, set inputs before the call, record token usage, and set outputs on success or failure — including retry events via SpanEvent.

Illustrative nested pattern (same structural idea: parent span, child LLM span, explicit inputs/outputs):

import mlflow


def run_optimization_step(query, context):
    with mlflow.start_span(name="optimization_step") as span:
        span.set_inputs({"query": query})

        with mlflow.start_span(name="strategist_call", span_type="LLM") as llm_span:
            llm_span.set_inputs({"prompt": formatted_prompt})
            result = call_llm(formatted_prompt)
            llm_span.set_outputs({"response": result})

        span.set_outputs({"result": result})
        return result

In production code you may prefer from mlflow.entities import SpanType and types such as SpanType.CHAIN for LLM orchestration spans, consistent with _traced_llm_call.

For the full _traced_llm_call implementation, error handling, token logging, and a multi-step agent example with nested AGENT/LLM/TOOL/JUDGE spans, see: references/tracing-patterns.md.

Trace tags and metadata

Enrich the current trace with session, user, and deployment context so runs are filterable and attributable. Reserved identity fields belong under metadata= (immutable, MLflow-recognized for UI filter / group); mutable routing dimensions belong under tags=.

import os

mlflow.update_current_trace(
    metadata={
        "mlflow.trace.user":    user_id,
        "mlflow.trace.session": session_id,
        "mlflow.source.type":   os.getenv("APP_ENVIRONMENT", "development"),
        "agent_version":        "1.2.0",
        "space_id":             space_id,
    },
    tags={
        "domain":   domain,
        "sla_tier": "gold",
    },
)

Call this from code that runs inside an active trace (for example after mlflow.start_run / autolog / @mlflow.trace has established trace context). Setting mlflow.trace.user / mlflow.trace.session under tags= still works for read-back but loses the immutability guarantee and the UI's first-class user / session facets — prefer metadata.

For the full tag taxonomy, metadata patterns, trace search queries, and monitoring dashboard integration, see: references/trace-context-patterns.md. For the canonical reference on user / session / environment context (auto-populated metadata, APP_ENVIRONMENT override, search by metadata), see 02c-trace-context-and-environments.

Connection pool configuration

Reduce flaky failures under load by setting MLflow HTTP client defaults before heavy tracing or evaluation traffic:

import os

os.environ.setdefault("MLFLOW_HTTP_REQUEST_MAX_RETRIES", "5")
os.environ.setdefault("MLFLOW_HTTP_REQUEST_TIMEOUT", "120")

Set these as early as possible in the job or app entrypoint (alongside other MLflow env vars from Foundation Step 1). Adjust retries and timeout for your workspace network and batch sizes.

For connection pool tuning in high-throughput serving scenarios and async tracing performance tips, see: references/tracing-patterns.md § 8.

DO / DON'T examples

Experiment organization

DO — Pin the experiment leaf to the user-and-use-case identity, and prefer reading from vibecoding-state:

# In a workshop-managed project, read the pre-derived path from state.
experiment_path = state["Resources"]["mlflow_experiment_path"]
# e.g. "/Users/jane.doe@example.com/mlflow/jane-d-stayfinder-agent"
mlflow.set_experiment(experiment_path)

# Stand-alone project — build the path from the same identity inputs.
user_email = "jane.doe@example.com"
app_name = "jane-d-stayfinder"  # ${FIRSTNAME}-${LASTINITIAL}-${use_case_slug}
experiment_path = f"/Users/{user_email}/mlflow/{app_name}-agent"
mlflow.set_experiment(experiment_path)

DON'T — Use a generic leaf, a hand-rolled /Shared/... default, or a hard-coded workspace path. The leaf is what shows up in the MLflow UI experiment list, and traces / Tracing / my-agent give every attendee on a shared workspace the same name:

# WRONG: generic leaf — collides across attendees, useless in the UI
mlflow.set_experiment("/Shared/my-agent/traces")

# WRONG: hard-coded workspace path that won't work across workspaces
mlflow.set_experiment("/Shared/my-specific-workspace-path/eval")

Tracing inputs and outputs

DO — Set inputs before the work and outputs after, including on failure:

with mlflow.start_span(name="llm_call", span_type=SpanType.CHAIN) as span:
    span.set_inputs({"prompt_chars": len(prompt), "model": model_name})
    try:
        result = call_llm(prompt)
        span.set_outputs({"response_chars": len(result), "status": "ok"})
    except Exception as exc:
        span.set_outputs({"error": str(exc)[:500], "status": "error"})
        raise

DON'T — Skip inputs/outputs or only record on success:

# WRONG: no inputs recorded, no outputs on failure path
with mlflow.start_span(name="llm_call") as span:
    result = call_llm(prompt)
    span.set_outputs({"result": result})  # never reached if call_llm raises

Trace context tags

DO — Put reserved identity fields (mlflow.trace.user / mlflow.trace.session) under metadata, mutable routing dimensions under tags:

mlflow.update_current_trace(
    metadata={
        "mlflow.trace.user":    user_id,
        "mlflow.trace.session": session_id,
        "space_id":             space_id,
        "agent_version":        "1.2.0",
    },
    tags={
        "domain":     domain,
        "sla_tier":   "gold",
    },
)

DON'T — Put mlflow.trace.user / mlflow.trace.session under tags, or skip context entirely:

# WRONG: reserved identity fields under tags — loses immutability + UI facets
mlflow.update_current_trace(
    tags={"mlflow.trace.user": user_id, "mlflow.trace.session": session_id},
)

# WRONG: no context at all — traces become impossible to attribute or group
# (just calling the function without update_current_trace)

Connection pool timing

DO — Set HTTP env vars at the top of your entrypoint, before any MLflow call:

import os
os.environ.setdefault("MLFLOW_HTTP_REQUEST_MAX_RETRIES", "5")
os.environ.setdefault("MLFLOW_HTTP_REQUEST_TIMEOUT", "120")

import mlflow  # env vars are read at import time

DON'T — Set env vars after MLflow is imported or mid-pipeline:

import mlflow  # already imported — env vars may be cached

# WRONG: setting after import may not take effect
os.environ["MLFLOW_HTTP_REQUEST_MAX_RETRIES"] = "5"

Unity Catalog OTEL trace storage (MLflow 3.11+)

Store MLflow traces in Unity Catalog Delta tables using an OpenTelemetry-compatible format. This enables SQL-queryable, long-term trace retention with UC access control, unlike the default experiment-scoped storage which is limited in retention and query flexibility.

When to use UC OTEL storage

Scenario	Default Experiment Storage	UC OTEL Storage
Development debugging	✓ Sufficient	Optional
Production monitoring	Limited retention	✓ Recommended
Compliance / audit trails	Not durable	✓ Required
Cross-experiment analysis	Difficult	✓ SQL joins across tables
Dashboard SQL queries	Not supported	✓ Native SQL access
Role-based access control	Experiment-level only	✓ UC table-level ACLs

Enable UC OTEL trace storage

Bind an experiment to a Unity Catalog location so traces flow into Delta tables:

import os
import mlflow
from mlflow.entities.trace_location import UnityCatalog

mlflow.set_tracking_uri("databricks")

# Required: SQL warehouse for writing traces to Delta tables
os.environ["MLFLOW_TRACING_SQL_WAREHOUSE_ID"] = "<SQL_WAREHOUSE_ID>"

experiment = mlflow.set_experiment(
    # Read from state — pinned to /Users/<user_email>/mlflow/<APP_NAME>-agent.
    experiment_name=state["Resources"]["mlflow_experiment_path"],
    trace_location=UnityCatalog(
        catalog_name="main",
        schema_name="agent_traces",
        # The prefix MUST mirror APP_NAME (underscored for table-name safety),
        # e.g. "jane_d_stayfinder" — never a generic "my_agent".
        table_prefix="my_agent",
    ),
)

This creates four Delta tables in the specified UC schema (with <table_prefix> bound to the underscored APP_NAME):

Table	Content
`my_agent_otel_annotations`	Trace-level annotations, tags, and feedback
`my_agent_otel_logs`	Structured log events within spans
`my_agent_otel_metrics`	Numeric metrics (token usage, latency, scores)
`my_agent_otel_spans`	Span hierarchy with inputs, outputs, timing, status

CRITICAL: Table permissions

UC OTEL tables require explicit MODIFY + SELECT grants (not ALL_PRIVILEGES) on each table for the service principal and any readers:

-- Grant write access to the app's service principal
GRANT MODIFY, SELECT ON TABLE main.agent_traces.my_agent_otel_annotations TO `<app-sp>`;
GRANT MODIFY, SELECT ON TABLE main.agent_traces.my_agent_otel_logs TO `<app-sp>`;
GRANT MODIFY, SELECT ON TABLE main.agent_traces.my_agent_otel_metrics TO `<app-sp>`;
GRANT MODIFY, SELECT ON TABLE main.agent_traces.my_agent_otel_spans TO `<app-sp>`;

-- Grant read access to analysts / dashboards
GRANT SELECT ON TABLE main.agent_traces.my_agent_otel_spans TO `analysts`;
GRANT SELECT ON TABLE main.agent_traces.my_agent_otel_metrics TO `analysts`;

Enable monitoring with UC OTEL

For production monitoring scorers to write results back to UC OTEL tables, bind the SQL warehouse ID:

from mlflow.tracing import set_databricks_monitoring_sql_warehouse_id

set_databricks_monitoring_sql_warehouse_id(
    sql_warehouse_id="<SQL_WAREHOUSE_ID>",
    experiment_id=experiment.experiment_id,
)

Call this at application startup, alongside set_experiment. Without it, registered scorers (SDLC Step 7) cannot persist results to UC OTEL tables.

Query UC OTEL traces with SQL

Once traces flow into UC Delta tables, query them directly:

-- Recent traces with latency
SELECT
    trace_id,
    span_name,
    start_time,
    end_time,
    TIMESTAMPDIFF(MILLISECOND, start_time, end_time) AS duration_ms,
    status_code
FROM main.agent_traces.my_agent_otel_spans
WHERE start_time > DATEADD(HOUR, -24, CURRENT_TIMESTAMP())
ORDER BY start_time DESC
LIMIT 100;

-- Token usage by model
SELECT
    JSON_EXTRACT_SCALAR(attributes, '$.llm.model') AS model,
    SUM(CAST(JSON_EXTRACT_SCALAR(attributes, '$.llm.token_count.prompt') AS INT)) AS prompt_tokens,
    SUM(CAST(JSON_EXTRACT_SCALAR(attributes, '$.llm.token_count.completion') AS INT)) AS completion_tokens
FROM main.agent_traces.my_agent_otel_spans
WHERE span_kind = 'LLM'
  AND start_time > DATEADD(DAY, -7, CURRENT_TIMESTAMP())
GROUP BY 1;

DO — Set warehouse ID before creating the experiment

import os
os.environ["MLFLOW_TRACING_SQL_WAREHOUSE_ID"] = "<WAREHOUSE_ID>"

import mlflow
from mlflow.entities.trace_location import UnityCatalog

experiment = mlflow.set_experiment(
    # /Users/<user_email>/mlflow/<APP_NAME>-agent — read from state.
    experiment_name=state["Resources"]["mlflow_experiment_path"],
    trace_location=UnityCatalog(
        catalog_name="main",
        schema_name="agent_traces",
        table_prefix="my_agent",  # MUST mirror underscored APP_NAME in production
    ),
)

DON'T — Create the experiment without the warehouse env var

import mlflow
from mlflow.entities.trace_location import UnityCatalog

# WRONG: MLFLOW_TRACING_SQL_WAREHOUSE_ID not set — tables can't be written
experiment = mlflow.set_experiment(
    experiment_name=state["Resources"]["mlflow_experiment_path"],
    trace_location=UnityCatalog(
        catalog_name="main",
        schema_name="agent_traces",
        table_prefix="my_agent",
    ),
)

DON'T — Use ALL_PRIVILEGES instead of explicit grants

-- WRONG: ALL_PRIVILEGES does not always include MODIFY for OTEL writes
GRANT ALL_PRIVILEGES ON TABLE main.agent_traces.my_agent_otel_spans TO `<sp>`;

-- DO: Explicit MODIFY + SELECT
GRANT MODIFY, SELECT ON TABLE main.agent_traces.my_agent_otel_spans TO `<sp>`;

F2 owns OTel grants and warehouse env (state capture)

F2 is the single owner of the OTel infrastructure contract — the Delta table prefix, the SQL warehouse env var, and the explicit per-table grants applied to the app service principal. Downstream skills (Track A 07 deploy, SDLC 06 deployment, SDLC 07 monitoring) do not re-derive any of these; they read them from state. The f2_grants_complete flag is the single gate read by preflight_check_registry.f2_grants_complete and by deferred_actions[] to unblock downstream prompts.

Capture these fields in state once F2 finishes provisioning:

# state://Foundation.f2_tracing
f2_grants_complete: true               # bool — set true only after every grant in app_service_principal_grants[] succeeds
otel_table_prefix: "my_agent"          # string — value passed to UnityCatalog(table_prefix=...); MUST match the literal string used; do NOT add a trailing underscore (MLflow appends `_otel_*`)
mlflow_tracing_sql_warehouse_id: "<warehouse-id>"  # canonical env var MLFLOW_TRACING_SQL_WAREHOUSE_ID; preflight_check_registry.mlflow_tracing_sql_warehouse_id_present reads this
app_service_principal_grants:          # one entry per (principal, object) tuple actually applied
  - principal: "<app-sp-application-id>"   # Databricks Apps SP application id (UUID), not display name
    object: "main.agent_traces.my_agent_otel_annotations"
    privileges: [MODIFY, SELECT]
  - principal: "<app-sp-application-id>"
    object: "main.agent_traces.my_agent_otel_logs"
    privileges: [MODIFY, SELECT]
  - principal: "<app-sp-application-id>"
    object: "main.agent_traces.my_agent_otel_metrics"
    privileges: [MODIFY, SELECT]
  - principal: "<app-sp-application-id>"
    object: "main.agent_traces.my_agent_otel_spans"
    privileges: [MODIFY, SELECT]

Rules:

otel_table_prefix is the literal string passed to UnityCatalog(table_prefix=...) — no trailing underscore. MLflow appends _otel_annotations / _otel_logs / _otel_metrics / _otel_spans. Passing my_agent_ produces my_agent__otel_* (double underscore) and breaks downstream queries; this is a recurring retrospective failure. The Track A and SDLC deploy skills read this field rather than re-deriving the prefix from the experiment name.
mlflow_tracing_sql_warehouse_id is the canonical name from canonical_names.env_vars. preflight_check_registry.mlflow_tracing_sql_warehouse_id_present fails closed if it is missing or empty — apps deployed without it silently drop UC OTel writes.
app_service_principal_grants[] enumerates explicit MODIFY, SELECT grants on each of the four *_otel_* tables (annotations, logs, metrics, spans) for the agent's deployment SP. ALL_PRIVILEGES is not equivalent for OTel writes — capture the literal grant applied. Track A 07 / SDLC 06 inspect this list at deploy time.
f2_grants_complete: true is the single gate. Set it only after every entry in app_service_principal_grants[] has been verified (SHOW GRANTS ON TABLE ... TO `` returns the recorded privileges). Until it is true, every prompt role listed under preflight_check_registry.f2_grants_complete.blocks_prompt_roles[] halts on enter.

OTeL GenAI Semantic-Convention Attributes

MLflow's trace UI and search indexing recognize a specific set of OpenTelemetry GenAI semantic-convention attributes (gen_ai.*, session.id, user.id). Spans that set these attributes render richer in the UI (clean prompt/response panes, token counts, session grouping) and become searchable via the MLflow API. Spans that skip them still work but show up as plain generic spans.

This matters most when you write custom spans (manual mlflow.start_span) or when you wire a 3rd-party OTeL SDK (e.g. a home-grown agent framework) into MLflow tracing.

Core attributes

Attribute	Meaning	Where to set
`gen_ai.operation.name`	`chat`, `completion`, `embedding`, `tool_call`	Every LLM/tool span
`gen_ai.system`	`anthropic`, `openai`, `databricks`	LLM spans
`gen_ai.request.model`	Model id (e.g. `databricks-claude-sonnet-4-6`)	LLM spans
`gen_ai.input.messages`	JSON array of messages sent to the model	LLM spans
`gen_ai.output.messages`	JSON array of messages returned	LLM spans
`gen_ai.usage.input_tokens`	Prompt tokens	LLM spans
`gen_ai.usage.output_tokens`	Completion tokens	LLM spans
`gen_ai.tool.name`	Tool invoked	Tool spans
`gen_ai.tool.arguments`	Tool arguments (JSON)	Tool spans
`session.id`	Conversation / session correlation id	Root span of every turn
`user.id`	Authenticated user id	Root span of every turn

Setting attributes in manual spans

import mlflow, json

with mlflow.start_span(name="call_llm", span_type="LLM") as span:
    span.set_attributes({
        "gen_ai.operation.name":      "chat",
        "gen_ai.system":              "databricks",
        "gen_ai.request.model":       "databricks-claude-sonnet-4-6",
        "gen_ai.input.messages":      json.dumps(messages),
    })

    resp = client.chat.completions.create(...)

    span.set_attributes({
        "gen_ai.output.messages":     json.dumps([resp.choices[0].message.model_dump()]),
        "gen_ai.usage.input_tokens":  resp.usage.prompt_tokens,
        "gen_ai.usage.output_tokens": resp.usage.completion_tokens,
    })

For MLflow-native filter / group / cohort views, prefer the reserved metadata keys mlflow.trace.user / mlflow.trace.session over the OTeL dotted-attribute form:

mlflow.update_current_trace(metadata={
    "mlflow.trace.user":    user_id,
    "mlflow.trace.session": session_id,
})

The OTeL session.id / user.id form is the span-attribute equivalent for third-party OTeL integrations (set on individual spans via span.set_attributes(...)). The MLflow form (metadata on the trace root) is preferred for first-party MLflow tracing because it's immutable post-log and lights up the Trace UI's user / session facets. See 02c-trace-context-and-environments for the full pattern.

Searching traces by gen_ai attributes

import mlflow

traces = mlflow.search_traces(
    experiment_names=["/Shared/skyloyalty/agent"],
    filter_string="span_attributes['gen_ai.request.model'] = 'databricks-claude-sonnet-4-6'"
                  " AND tags['session.id'] = 'abc-123'",
    max_results=100,
)

Without these attributes, the best you can do is filter by trace name or timestamp — much coarser.

Third-party OTeL integration

If your agent uses a non-MLflow OTeL SDK (e.g. OpenTelemetry Python directly, or a framework's built-in tracer), configure the OTeL exporter to target MLflow's tracing endpoint and ensure your spans follow the gen_ai.* naming. The Databricks docs have the full list and any MLflow-specific extensions.

See Databricks: OTeL span attributes for 3rd-party integrations for the complete attribute reference.

Do / Don't

DO	DON'T
Set `gen_ai.operation.name` on every LLM/tool span.	Leave span attributes empty and expect rich UI rendering.
Store messages as JSON in `gen_ai.input.messages` / `gen_ai.output.messages`.	Store them as Python dicts — JSON-encode first.
Set `mlflow.trace.user` / `mlflow.trace.session` (metadata) on the trace root, not each span.	Repeat them on every span, or store them under tags — wastes storage and loses UI facets.
Use `span_attributes['gen_ai.*']` in `search_traces` filters.	Parse trace JSON by hand to filter offline.
Include `gen_ai.usage.*_tokens` when available.	Let cost dashboards estimate tokens from request length.

Validation checklist

References

Official documentation

MLflow tracing overview
Databricks: MLflow 3 and GenAI (tracing, evaluation, and workspace-specific behavior)
Trace tags and metadata (tag keys, update_current_trace)
Store MLflow traces in Unity Catalog (UC OTEL trace storage, table schema, permissions)
Enable production monitoring with UC traces (monitoring SQL warehouse binding)
Third-party OTeL span attributes for GenAI (gen_ai.* semantic-convention reference)
Add traces to applications: automatic and manual tracing (decision matrix for auto / manual / combined)
Automatic tracing (20+ supported libraries)
Trace agents deployed on Databricks (production env vars, Production Monitoring → Delta)
Instrument Node.js applications with MLflow Tracing (companion sibling skill: 02b-typescript-tracing)
Add context to traces (canonical reference for mlflow.trace.user / mlflow.trace.session / environment metadata — sibling skill: 02c-trace-context-and-environments)

Related skills

Foundation Step 1: MLflow GenAI Foundation — tracking URI, auth, environment detection
Foundation Step 2b: TypeScript tracing — Node sibling using the official mlflow-tracing npm SDK
Foundation Step 2c: Trace context and environments — canonical user / session / environment metadata + APP_ENVIRONMENT override

The patterns in this skill are demonstrated in the Genie Space Optimizer reference implementation. In your own project, apply them to your module structure.

Local reference files

Reference	Lines	Content
`references/experiment-organization.md`	~300	`ExperimentManager` class, path templates, tagging strategies, search & cleanup
`references/tracing-patterns.md`	~350	All span types, decorator/manual tracing, nested agents, error handling, perf tips
`references/trace-context-patterns.md`	~200	Tag taxonomy, metadata patterns, trace search, dashboard integration
`references/autolog-integrations.md`	~250	20+ `mlflow.<library>.autolog()` integrations, multi-framework combine, serverless caveat
`references/prod-tracing-deployment.md`	~250	Production deployment env-var matrix: Agent Framework auto-tracing, custom CPU serving (`ENABLE_MLFLOW_TRACING`, `MLFLOW_EXPERIMENT_ID`, SP `CAN_EDIT`), Git-folder caveat, Production Monitoring → Delta, AI Gateway alternative

Version history

Version	Date	Changes
3.6.0	2026-04-26	F2 now owns the OTel grants + warehouse env contract. Added "F2 owns OTel grants and warehouse env (state capture)" subsection capturing `f2_grants_complete`, `otel_table_prefix`, `mlflow_tracing_sql_warehouse_id`, and `app_service_principal_grants[]` so downstream skills (Track A 07, SDLC 06/07) read them from state instead of re-deriving. Documents the `gw`-style "no trailing underscore" trap (passing `my_agent_` produces `my_agent__otel_*`). Validation checklist gates the four fields. Closes the rollup "`UCSchemaLocation` vs `UnityCatalog(table_prefix=...)`" row.
3.5.0	2026-04-24	Modernized "Trace tags and metadata" + DO/DON'T examples to put `mlflow.trace.user` / `mlflow.trace.session` under `metadata=` (immutable, MLflow-recognized) instead of `tags=`. Updated OTeL section to prefer the metadata form over the `session.id` / `user.id` span-attribute form. Added F2c sibling-skill callout. Updated validation checklist + grounded_in metadata.
3.4.0	2026-04-24	Added auto-vs-manual-vs-combined decision matrix (sourced from app-instrumentation overview). Added TypeScript / Node sibling-skill callout (F2b). Added production-deployment callout pointing at `references/prod-tracing-deployment.md`. New references: `autolog-integrations.md` (20+ libraries), `prod-tracing-deployment.md` (env-var matrix). Updated grounded_in metadata.
3.3.0	2026-04-19	Added OTeL GenAI semantic-convention attributes section: `gen_ai.*` attributes, `session.id` / `user.id`, search filters, 3rd-party OTeL integration link. Extended validation checklist.
3.2.0	2026-04-10	Added Unity Catalog OTEL trace storage section (MLflow 3.11+): `trace_location=UnityCatalog(...)`, 4-table schema, MODIFY+SELECT grants, monitoring warehouse binding, SQL query examples, DO/DON'T pairs. Updated validation checklist and references.
3.1.0	2026-03-26	Added reference files, DO/DON'T examples, version history, connection pool reference pointer
3.0.0	2026-03-25	Initial structured skill with experiment organization, tracing, trace context, and connection pool

02-experiment-tracing-and-uc-storage

同仓库更多 Skills

Experiment tracing setup

When to Use

Which approach: automatic vs manual vs combined

Experiment path organization

CRITICAL: consume the experiment path from state — do not invent one

Path template (for projects that do not run on top of vibecoding-state)

Three-experiment lifecycle pattern

Setting the experiment

CRITICAL: Prompt registry linkage

Tracing with decorators

Manual span creation

Trace tags and metadata

Connection pool configuration

DO / DON'T examples

Experiment organization

Tracing inputs and outputs

Trace context tags

Connection pool timing

Unity Catalog OTEL trace storage (MLflow 3.11+)

When to use UC OTEL storage

Enable UC OTEL trace storage

CRITICAL: Table permissions

Enable monitoring with UC OTEL

Query UC OTEL traces with SQL

DO — Set warehouse ID before creating the experiment

DON'T — Create the experiment without the warehouse env var

DON'T — Use ALL_PRIVILEGES instead of explicit grants

F2 owns OTel grants and warehouse env (state capture)

OTeL GenAI Semantic-Convention Attributes

Core attributes

Setting attributes in manual spans

Searching traces by gen_ai attributes

Third-party OTeL integration

Do / Don't

Validation checklist

References

Official documentation

Related skills

Local reference files

Version history

Experiment tracing setup

When to Use

Which approach: automatic vs manual vs combined

Experiment path organization

CRITICAL: consume the experiment path from state — do not invent one

Path template (for projects that do not run on top of vibecoding-state)

Three-experiment lifecycle pattern

Setting the experiment

CRITICAL: Prompt registry linkage

Tracing with decorators

Manual span creation

Trace tags and metadata

Connection pool configuration

DO / DON'T examples

Experiment organization

Tracing inputs and outputs

Trace context tags

Connection pool timing

Unity Catalog OTEL trace storage (MLflow 3.11+)

When to use UC OTEL storage

Enable UC OTEL trace storage

CRITICAL: Table permissions

Enable monitoring with UC OTEL

Query UC OTEL traces with SQL

DO — Set warehouse ID before creating the experiment

DON'T — Create the experiment without the warehouse env var

DON'T — Use ALL_PRIVILEGES instead of explicit grants

F2 owns OTel grants and warehouse env (state capture)

OTeL GenAI Semantic-Convention Attributes

Core attributes

Setting attributes in manual spans

Searching traces by gen_ai attributes

Third-party OTeL integration

Do / Don't

Validation checklist

References

Official documentation

Related skills

Path template (for projects that do not run on top of `vibecoding-state`)

Path template (for projects that do not run on top of `vibecoding-state`)