一键导入
inspect
Open the web UI to visually inspect, edit, and run the benchmark pipeline. Use when the user wants a visual interface for their pipeline.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Open the web UI to visually inspect, edit, and run the benchmark pipeline. Use when the user wants a visual interface for their pipeline.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Initialize a new agentic-usability benchmark pipeline project. Use when setting up a new SDK benchmark, creating a config.json, or starting a new evaluation project.
Launch an interactive shell inside a microsandbox for debugging. Supports bare mode, executor setup, or judge setup with optional test case scaffolding.
Run the full evaluation pipeline (execute, judge, report) for an SDK usability benchmark. Use when running a complete benchmark end-to-end, resuming an interrupted pipeline, or checking pipeline status.
Execute benchmark test cases in sandboxed environments with AI agents. Spins up microsandbox containers for each test case and extracts solutions.
Export a benchmark pipeline as a zip file for sharing or archiving. Excludes cache and large snapshots.
Generate SDK usability test cases by exploring source code. Use when creating benchmark test suites, generating test cases for an SDK, or when the user wants to create evaluation scenarios.
| name | inspect |
| description | Open the web UI to visually inspect, edit, and run the benchmark pipeline. Use when the user wants a visual interface for their pipeline. |
| argument-hint | [project-directory] [--port 7373] |
| disable-model-invocation | true |
| allowed-tools | Bash(agentic-usability *) Read Glob |
Launch the web-based inspector for the benchmark pipeline.
echo "Arguments: $ARGUMENTS"
--port <number>: Port for the local server (default: 7373)The web UI serves data from the project directory:
<project>/
config.json # Pipeline configuration
suite.json # Test suite (array of test cases)
results/
<runId>/ # e.g. run-2026-04-25T10-30-00-000Z
run.json # Run manifest (id, targets, testCount, label)
pipeline-state.json # Pipeline progress tracker
report.json # Aggregate scorecard (if pipeline completed)
<target>/<testId>/ # Per-test results
generated-solution.json # Agent's solution
judge.json # Judge scores
agent-notes.md # Agent's working notes
agent-output.log # Raw output
agent-session.jsonl # Agent conversation log
judge-session.jsonl # Judge conversation log
results/ with a run.json are runspipeline-state.json to see if a run is complete (stage: "report") or pausedRun agentic-usability inspect -p $ARGUMENTS to start the server. It opens the browser automatically. Press Ctrl+C to stop.
For the full file inventory, see pipeline-guide.md.