| name | phoenix-cli |
| description | Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Use whenever the user is analyzing traces or spans, investigating LLM/agent failures, deciding what to do after instrumenting an app, building failure taxonomies, choosing what evals to write, or asking "what's going wrong", "what kinds of mistakes", or "where do I focus" — even without naming a technique. |
| license | Apache-2.0 |
| compatibility | Requires Node.js (for npx) or global install of @arizeai/phoenix-cli. Optionally requires jq for JSON processing. |
| metadata | {"author":"arize-ai","version":"3.3.0"} |
Phoenix CLI
Invocation
px <resource> <action>
npx @arizeai/phoenix-cli <resource> <action>
The CLI uses singular resource commands with subcommands like list and get:
px trace list
px trace get <trace-id>
px trace annotate <trace-id>
px trace add-note <trace-id>
px trace-annotations delete
px span list
px span annotate <span-id>
px span add-note <span-id>
px span-annotations delete
px session list
px session get <session-id>
px session annotate <session-id>
px session add-note <session-id>
px session-annotations delete
px dataset list
px dataset get <name>
px project list
px project get <name>
px annotation-config list
px auth status
px profile list
px profile show [name]
px profile create <name>
px profile use <name>
px profile edit <name>
px profile delete <name>
Setup
export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key
Always use --format raw --no-progress when piping to jq.
Quick Reference
| Task | Files |
|---|
| Look at sampled traces, spans, or sessions and write specific notes about what went wrong (no taxonomy yet) | references/open-coding |
| Group those notes into a structured failure taxonomy and quantify what matters | references/axial-coding |
Both stages tag every artifact with one shared coding annotation identifier (descriptive shape, e.g. coding-run:chatbot-context-loss-2026-05-06) so the run is queryable, reversible, and viewable as a unit. Pass --identifier <value> explicitly on every px call — shell inheritance is unreliable across agent harnesses. Open coding writes notes via px ... add-note and records a small local JSONL sidecar at .px/coding/<sanitized-identifier>.jsonl; axial coding reads that sidecar as the deterministic handoff and records labels in .px/coding/<sanitized-identifier>-axial.jsonl. Pick the identifier once per run (see references/open-coding.md), then share the Phoenix UI link from the wrap-up section. Revert is opt-in and runs three identifier-bound DELETEs only after explicit user confirmation.
Workflow term vs. server annotation name. The skill prose calls this value the coding annotation identifier (shell-variable hint: CODING_ANNOTATION_IDENTIFIER). The server-side annotation NAME used for the UI filter is unchanged — coding_session_id — for data compatibility with rows already written by previous runs. Don't try to rename the server-side annotation; treat the asymmetry as load-bearing.
Workflows
"What do I do after instrumenting?" / "Where do I focus?" / "What's going wrong?"
open-coding → axial-coding → build evals for the top categories.
Reference Categories
| Prefix | Description |
|---|
references/open-coding | Free-form notes against sampled traces, spans, or sessions — reach for it whenever the user wants to make sense of LLM traffic but has no failure categories yet. Includes a unit-of-analysis diagnostic so the workflow runs at the level the failure modes actually live at (trace for stateless single-shot calls, session for multi-turn agents, span for mechanical/in-isolation failures). |
references/axial-coding | Inductive grouping of notes into a MECE taxonomy with counts — reach for it whenever the user has observations and needs categories or eval targets |
Auth
px auth status
px auth status --endpoint http://other:6006
px auth status --profile staging
Profiles
Named profiles let you switch between multiple Phoenix instances (local, staging, cloud) without juggling environment variables. Profiles are stored in ~/.px/settings.json (or $XDG_CONFIG_HOME/px/settings.json).
Configuration priority (highest to lowest): CLI flags > env vars > active profile > built-in defaults.
px profile list
px profile show
px profile show staging
px profile create prod --endpoint https://app.phoenix.arize.com --api-key <key> --activate
px profile create local --endpoint http://localhost:6006 --project my-app
px profile use prod
px profile edit prod
px profile delete prod --yes
Use --profile <name> on any command to target a specific profile without changing the active one:
px trace list --profile staging --limit 10 --format raw --no-progress | jq .
px auth status --profile prod
px profile create options: --endpoint <url>, --project <name>, --api-key <key>, --header <key=value> (repeatable), --activate.
Projects
px project list
px project list --format raw --no-progress | jq '.[].name'
px project get my-project --format raw --no-progress
px project get my-project --format raw --no-progress | jq -r '.id'
project get exits with ExitCode.FAILURE (1) on a name miss and writes a StructuredError {error, code: "FAILURE", hint} to stderr in --format json|raw.
Traces
px trace list --limit 20 --format raw --no-progress | jq .
px trace list --last-n-minutes 60 --limit 20 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'
px trace list --since 2025-01-15T00:00:00Z --limit 50 --format raw --no-progress | jq .
px trace list --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'
px trace list --include-notes --format raw --no-progress | jq '.[].notes'
px trace get <trace-id> --format raw | jq .
px trace get <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'
px trace get <trace-id> --include-notes --format raw | jq '.notes'
px trace annotate <trace-id> --name reviewer --label pass
px trace annotate <trace-id> --name reviewer --score 0.9 --format raw --no-progress
px trace annotate <trace-id> --name reviewer --label pass --identifier "<coding-annotation-id>"
px trace add-note <trace-id> --text "needs follow-up"
px trace add-note <trace-id> --text "needs follow-up" --identifier "<coding-annotation-id>"
px trace-annotations delete --identifier "<coding-annotation-id>" --all -y
px <entity>-annotations delete requires --all or both --start-time and --end-time and emits {deleted: true, target, filter} on success.
Trace JSON shape
Trace
traceId, status ("OK"|"ERROR"), duration (ms), startTime, endTime
annotations[] (with --include-annotations, excludes note)
name, result { score, label, explanation }
notes[] (with --include-notes)
name="note", result { explanation }
rootSpan — top-level span (parent_id: null)
spans[]
name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT"|"RERANKER"|"GUARDRAIL"|"EVALUATOR"|"UNKNOWN")
status_code ("OK"|"ERROR"|"UNSET"), parent_id, context.span_id
notes[] (with --include-notes)
name="note", result { explanation }
attributes
input.value, output.value — raw input/output
llm.model_name, llm.provider
llm.token_count.prompt/completion/total
llm.token_count.prompt_details.cache_read
llm.token_count.completion_details.reasoning
llm.input_messages.{N}.message.role/content
llm.output_messages.{N}.message.role/content
llm.invocation_parameters — JSON string (temperature, etc.)
exception.message — set if span errored
Spans
px span list --limit 20
px span list --last-n-minutes 60 --limit 50
px span list --since 2025-01-15T00:00:00Z --limit 50
px span list --span-kind LLM --limit 10
px span list --status-code ERROR --limit 20
px span list --name chat_completion --limit 10
px span list --trace-id <id> --format raw --no-progress | jq .
px span list --parent-id null --limit 10
px span list --parent-id <span-id> --limit 10
px span list --include-annotations --limit 10
px span list --include-notes --limit 10
px span list --attribute llm.model_name:gpt-4 --limit 10
px span list --attribute llm.token_count.total:500 --limit 10
px span list --attribute 'user.id:"12345"' --limit 10
px span list --attribute session.id:sess:abc:123 --limit 20
px span list --attribute llm.model_name:gpt-4 --attribute session.id:abc --limit 10
px span list output.json --limit 100
px span list --format raw --no-progress | jq '.[] | select(.status_code == "ERROR")'
px span annotate <span-id> --name reviewer --label pass
px span annotate <span-id> --name checker --score 1 --annotator-kind CODE
px span annotate <span-id> --name reviewer --label pass --identifier "<coding-annotation-id>"
px span add-note <span-id> --text "verified by agent"
px span add-note <span-id> --text "verified by agent" --identifier "<coding-annotation-id>"
px span-annotations delete --identifier "<coding-annotation-id>" --all -y
Span JSON shape
Span
name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT"|"RERANKER"|"GUARDRAIL"|"EVALUATOR"|"UNKNOWN")
status_code ("OK"|"ERROR"|"UNSET"), status_message
context.span_id, context.trace_id, parent_id
start_time, end_time
attributes
input.value, output.value — raw input/output
llm.model_name, llm.provider
llm.token_count.prompt/completion/total
llm.input_messages.{N}.message.role/content
llm.output_messages.{N}.message.role/content
llm.invocation_parameters — JSON string (temperature, etc.)
exception.message — set if span errored
annotations[] (with --include-annotations, excludes note)
name, result { score, label, explanation }
notes[] (with --include-notes)
name="note", result { explanation }
Sessions
px session list --limit 10 --format raw --no-progress | jq .
px session list --order asc --format raw --no-progress | jq '.[].session_id'
px session list --include-annotations --include-notes --format raw --no-progress | jq '.[].notes'
px session get <session-id> --format raw | jq .
px session get <session-id> --include-annotations --format raw | jq '.session.annotations'
px session get <session-id> --include-notes --format raw | jq '.session.notes'
px session annotate <session-id> --name reviewer --label pass
px session annotate <session-id> --name reviewer --score 0.9 --format raw --no-progress
px session annotate <session-id> --name reviewer --label pass --identifier "<coding-annotation-id>"
px session add-note <session-id> --text "verified by agent"
px session add-note <session-id> --text "verified by agent" --identifier "<coding-annotation-id>"
px session-annotations delete --identifier "<coding-annotation-id>" --all -y
Session JSON shape
SessionData
id, session_id, project_id
start_time, end_time
token_count_prompt, token_count_completion, token_count_total — cumulative across all LLM spans in the session (int, default 0)
annotations[] (with --include-annotations, excludes note)
name, result { score, label, explanation }
notes[] (with --include-notes)
name="note", result { explanation }
traces[]
id, trace_id, start_time, end_time
Datasets / Experiments / Prompts
px dataset list --format raw --no-progress | jq '.[].name'
px dataset get <name> --format raw | jq '.examples[] | {input, output: .expected_output}'
px dataset get <name> --split train --format raw | jq .
px dataset get <name> --version <version-id> --format raw | jq .
px experiment list --dataset <name> --format raw --no-progress | jq '.[] | {id, name, failed_run_count}'
px experiment get <id> --format raw --no-progress | jq '.[] | select(.error != null) | {input, error}'
px prompt list --format raw --no-progress | jq '.[].name'
px prompt get <name> --format text --no-progress
Annotation Configs
px annotation-config list
px annotation-config list --format raw --no-progress | jq '.[].name'
GraphQL
For ad-hoc queries not covered by the commands above. Output is {"data": {...}}.
px api graphql '{ projectCount datasetCount promptCount evaluatorCount }'
px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }' | jq '.data.projects.edges[].node'
px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | jq '.data.datasets.edges[].node'
px api graphql '{ evaluators { edges { node { name kind } } } }' | jq '.data.evaluators.edges[].node'
px api graphql '{ __type(name: "Project") { fields { name type { name } } } }' | jq '.data.__type.fields[]'
Key root fields: projects, datasets, prompts, evaluators, projectCount, datasetCount, promptCount, evaluatorCount, viewer.
Docs
Download Phoenix documentation markdown for local use by coding agents.
px docs fetch
px docs fetch --workflow tracing
px docs fetch --workflow tracing --workflow evaluation
px docs fetch --dry-run
px docs fetch --refresh
px docs fetch --output-dir ./my-docs
Key options: --workflow (repeatable, values: tracing, evaluation, datasets, prompts, integrations, sdk, self-hosting, all), --dry-run, --refresh, --output-dir (default .px/docs), --workers (default 10).