원클릭으로 Manus에서 모든 스킬 실행

$pwd:

playground

Name: Playground
Author: Arize-ai

// Author, edit, or iterate on prompts in the Phoenix prompt playground. Load before any playground tool call, including single-shot prompt rewrites.

Manus에서 실행

$ git log --oneline --stat

stars:9,927

forks:905

updated:2026년 5월 28일 15:23

SKILL.md

readonly

name	playground
description	Author, edit, or iterate on prompts in the Phoenix prompt playground. Load before any playground tool call, including single-shot prompt rewrites.

Prompt Playground

The prompt playground is a tool for authoring and optimizing prompts. It supports two different ways of working: fast manual prompt iteration without a dataset, and dataset-backed prompt experimentation with evaluators and experiments. Choose the workflow that matches the user's current goal and the UI context they have mounted.

Workflow: Create And Iterate Without A Dataset

Use this workflow when the user wants to draft, rewrite, or manually improve a prompt and no dataset-backed evaluation loop is in scope.

Clarify the task the prompt must perform: input variables, expected output shape, audience, constraints, and examples of good or bad behavior when available.
If a playground prompt already exists, call read_prompt_instance before proposing changes so you have the current messages, message IDs, labels, and revision.
Draft or revise the prompt so it clearly states the task, required context, output contract, and success criteria. Keep the prompt directly tied to the user's stated goal.
Use edit_prompt_instance for changes to the mounted prompt so the user can review the diff before accepting it.
Use clone_prompt_instance when comparing alternatives would help the user choose between prompt variants. Discuss variants by their alphabetic labels, but pass numeric instance IDs to tools.
Use set_variable_values when the user provides manual values for prompt template variables.
Call run_playground only when the user asks to run, try, test, or compare the current prompt. Treat the output as qualitative feedback rather than dataset-backed evidence.
After the run finishes, call read_playground_output to inspect raw output and get the traceId for trace analysis when needed.
Inspect the output with the user, identify the next concrete improvement, and repeat the edit or comparison loop until the prompt is useful for the task.

Workflow: Iterate Over A Dataset With Evaluators And Experiments

Use this workflow when the user wants evidence that a prompt is improving across a dataset, or when they are comparing prompt variants using evaluator results.

Confirm the dataset represents the task the prompt is meant to solve, including the important input fields, expected outputs, and failure modes.
Make sure the starting prompt is well formed before running it: it should define the task, relevant variables, output format, and any constraints needed for consistent evaluation.
Run the playground over the dataset. Each prompt instance run over a dataset is captured as an experiment, with outputs and evaluator annotations available for review.
Review the experiment outputs and annotations to find recurring failure patterns. Use bash with phoenix-gql to inspect dataset-backed experiment results when needed; read_playground_output only reads manual playground runs. Separate model randomness from prompt issues when possible.
Use or add evaluators when they make issue detection more systematic, especially for failures that are hard to spot by manual review alone.
Form a specific hypothesis for improving the prompt, then use edit_prompt_instance or clone_prompt_instance to create the next candidate.
Rerun the playground and compare experiments. Look for evaluator improvements, fewer repeated failure modes, and acceptable tradeoffs in output quality.
Save a prompt snapshot only after the evidence shows an improvement or the user explicitly accepts the tradeoff.
Continue the hypothesis, edit, run, compare loop until the dataset-backed results satisfy the user's goal.

related-skills.json

같은 저장소

phoenix-pxi.md

from "Arize-ai/phoenix"

Development guide for the Phoenix PXI agent. Use when modifying PXI-specific frontend or backend behavior, extending PXI tool wiring, updating PXI runtime capabilities, or changing the PXI agent request/dispatch flow. Start here for PXI-specific workflows, then read the relevant resource file for the layer you are changing.

2026-05-309.9k

phoenix-frontend.md

from "Arize-ai/phoenix"

Frontend development guidelines for the Phoenix AI observability platform. Use when writing, reviewing, or modifying React components, TypeScript code, styles, or UI features in the app/ directory. Triggers on any frontend task — new components, UI changes, styling, accessibility fixes, form handling, or component refactoring. Also use when the user asks about frontend conventions or component patterns for this project. For design system rules (error display, layout, dialogs, tokens), use the phoenix-design skill.

2026-05-299.9k

phoenix-design.md

from "Arize-ai/phoenix"

Design system conventions for the Phoenix frontend — layout, dialogs, error display, BEM CSS class naming, and CSS design tokens. Use when building UI, naming CSS classes, creating or consuming tokens, handling errors, or designing dialog interactions in app/src/.

2026-05-299.9k

debug-trace.md

from "Arize-ai/phoenix"

Diagnose failure modes by systematically investigating traces. Trigger when the user explicitly asks for cross-trace diagnosis: "what's going wrong?", "were there errors?", "debug this", "where is my agent struggling?". Do NOT trigger on: (1) advice questions ("what should I do?"), (2) statistical questions ("what's the average latency?"), (3) summarize requests, (4) trace filtering ("show me traces with errors"), (5) vague questions ("is there a problem?"), (6) unrelated requests.

2026-05-289.9k

phoenix-cli.md

from "Arize-ai/phoenix"

Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Use whenever the user is analyzing traces or spans, investigating LLM/agent failures, deciding what to do after instrumenting an app, building failure taxonomies, choosing what evals to write, or asking "what's going wrong", "what kinds of mistakes", or "where do I focus" — even without naming a technique.

2026-05-279.9k

phoenix-tracing.md

from "Arize-ai/phoenix"

OpenInference semantic conventions and instrumentation for Phoenix AI observability. Use when implementing LLM tracing, creating custom spans, or deploying to production.

2026-05-239.9k

package.json

"author": "Arize-ai"

"repository": "Arize-ai/phoenix"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

Prompt Playground

Workflow: Create And Iterate Without A Dataset

Use this workflow when the user wants to draft, rewrite, or manually improve a prompt and no dataset-backed evaluation loop is in scope.

Clarify the task the prompt must perform: input variables, expected output shape, audience, constraints, and examples of good or bad behavior when available.

If a playground prompt already exists, call read_prompt_instance before proposing changes so you have the current messages, message IDs, labels, and revision.

Draft or revise the prompt so it clearly states the task, required context, output contract, and success criteria. Keep the prompt directly tied to the user's stated goal.

Use edit_prompt_instance for changes to the mounted prompt so the user can review the diff before accepting it.

Use clone_prompt_instance when comparing alternatives would help the user choose between prompt variants. Discuss variants by their alphabetic labels, but pass numeric instance IDs to tools.

Use set_variable_values when the user provides manual values for prompt template variables.

Call run_playground only when the user asks to run, try, test, or compare the current prompt. Treat the output as qualitative feedback rather than dataset-backed evidence.

After the run finishes, call read_playground_output to inspect raw output and get the traceId for trace analysis when needed.

Inspect the output with the user, identify the next concrete improvement, and repeat the edit or comparison loop until the prompt is useful for the task.

Workflow: Iterate Over A Dataset With Evaluators And Experiments

Use this workflow when the user wants evidence that a prompt is improving across a dataset, or when they are comparing prompt variants using evaluator results.

Confirm the dataset represents the task the prompt is meant to solve, including the important input fields, expected outputs, and failure modes.

Make sure the starting prompt is well formed before running it: it should define the task, relevant variables, output format, and any constraints needed for consistent evaluation.

Run the playground over the dataset. Each prompt instance run over a dataset is captured as an experiment, with outputs and evaluator annotations available for review.

Review the experiment outputs and annotations to find recurring failure patterns. Use bash with phoenix-gql to inspect dataset-backed experiment results when needed; read_playground_output only reads manual playground runs. Separate model randomness from prompt issues when possible.

Use or add evaluators when they make issue detection more systematic, especially for failures that are hard to spot by manual review alone.

Form a specific hypothesis for improving the prompt, then use edit_prompt_instance or clone_prompt_instance to create the next candidate.

Rerun the playground and compare experiments. Look for evaluator improvements, fewer repeated failure modes, and acceptable tradeoffs in output quality.

Save a prompt snapshot only after the evidence shows an improvement or the user explicitly accepts the tradeoff.

Continue the hypothesis, edit, run, compare loop until the dataset-backed results satisfy the user's goal.

playground

Prompt Playground

Workflow: Create And Iterate Without A Dataset

Workflow: Iterate Over A Dataset With Evaluators And Experiments

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Prompt Playground

Workflow: Create And Iterate Without A Dataset

Workflow: Iterate Over A Dataset With Evaluators And Experiments