원클릭으로
playground
// Author, edit, or iterate on prompts in the Phoenix prompt playground. Load before any playground tool call, including single-shot prompt rewrites.
// Author, edit, or iterate on prompts in the Phoenix prompt playground. Load before any playground tool call, including single-shot prompt rewrites.
Development guide for the Phoenix PXI agent. Use when modifying PXI-specific frontend or backend behavior, extending PXI tool wiring, updating PXI runtime capabilities, or changing the PXI agent request/dispatch flow. Start here for PXI-specific workflows, then read the relevant resource file for the layer you are changing.
Frontend development guidelines for the Phoenix AI observability platform. Use when writing, reviewing, or modifying React components, TypeScript code, styles, or UI features in the app/ directory. Triggers on any frontend task — new components, UI changes, styling, accessibility fixes, form handling, or component refactoring. Also use when the user asks about frontend conventions or component patterns for this project. For design system rules (error display, layout, dialogs, tokens), use the phoenix-design skill.
Design system conventions for the Phoenix frontend — layout, dialogs, error display, BEM CSS class naming, and CSS design tokens. Use when building UI, naming CSS classes, creating or consuming tokens, handling errors, or designing dialog interactions in app/src/.
Diagnose failure modes by systematically investigating traces. Trigger when the user explicitly asks for cross-trace diagnosis: "what's going wrong?", "were there errors?", "debug this", "where is my agent struggling?". Do NOT trigger on: (1) advice questions ("what should I do?"), (2) statistical questions ("what's the average latency?"), (3) summarize requests, (4) trace filtering ("show me traces with errors"), (5) vague questions ("is there a problem?"), (6) unrelated requests.
Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Use whenever the user is analyzing traces or spans, investigating LLM/agent failures, deciding what to do after instrumenting an app, building failure taxonomies, choosing what evals to write, or asking "what's going wrong", "what kinds of mistakes", or "where do I focus" — even without naming a technique.
OpenInference semantic conventions and instrumentation for Phoenix AI observability. Use when implementing LLM tracing, creating custom spans, or deploying to production.
| name | playground |
| description | Author, edit, or iterate on prompts in the Phoenix prompt playground. Load before any playground tool call, including single-shot prompt rewrites. |
The prompt playground is a tool for authoring and optimizing prompts. It supports two different ways of working: fast manual prompt iteration without a dataset, and dataset-backed prompt experimentation with evaluators and experiments. Choose the workflow that matches the user's current goal and the UI context they have mounted.
Use this workflow when the user wants to draft, rewrite, or manually improve a prompt and no dataset-backed evaluation loop is in scope.
read_prompt_instance before proposing changes so
you have the current messages, message IDs, labels, and revision.edit_prompt_instance for changes to the mounted prompt so the user can review the diff
before accepting it.clone_prompt_instance when comparing alternatives would help the user choose between
prompt variants. Discuss variants by their alphabetic labels, but pass numeric instance IDs to
tools.set_variable_values when the user provides manual values for prompt template variables.run_playground only when the user asks to run, try, test, or compare the current prompt.
Treat the output as qualitative feedback rather than dataset-backed evidence.read_playground_output to inspect raw output and get the traceId
for trace analysis when needed.Use this workflow when the user wants evidence that a prompt is improving across a dataset, or when they are comparing prompt variants using evaluator results.
bash with
phoenix-gql to inspect dataset-backed experiment results when needed; read_playground_output
only reads manual playground runs. Separate model randomness from prompt issues when possible.edit_prompt_instance or
clone_prompt_instance to create the next candidate.