一键导入
benchmark
// Run benchmark suites and manage policy evolution — create challengers, compare against champions, promote or rollback policies.
// Run benchmark suites and manage policy evolution — create challengers, compare against champions, promote or rollback policies.
Examine Autodialectics run results, manifests, and stored artifacts. Use after a pipeline run completes or to review past runs.
Re-run a stored Autodialectics pipeline run, optionally with a different policy, to compare outcomes or debug regressions.
Compile and execute an Autodialectics anti-slop pipeline for a task. Covers health checks, runtime init, contract compilation, and full pipeline execution.
Use the local Autodialectics MCP server and CLI to compile tasks, execute anti-slop runs, inspect artifacts, replay runs, benchmark policies, and evolve champions in this repository.
| name | benchmark |
| description | Run benchmark suites and manage policy evolution — create challengers, compare against champions, promote or rollback policies. |
Use this skill to benchmark policies and drive champion/challenger evolution.
autodialectics-mcp must be on PATH (pip install autodialectics).
benchmark(suite_dir?, policy_id?) — run the benchmark suite against a policy. Returns case-by-case results with scores and decisions.evolve_policy(use_gepa?) — analyze recent benchmark reports and create a challenger policy. Set use_gepa: false to skip the GEPA optimizer (simpler heuristic fallback).promote_policy(policy_id) — promote a challenger to champion if comparison rules allow.rollback_policy() — revert to the previous champion if the current one regresses.autodialectics benchmark
autodialectics evolve
autodialectics promote <policy_id>
autodialectics rollback
benchmark → evolve_policy → benchmark (with challenger) → compare → promote or rollback
evolve_policy returns no_reports, run benchmarks first to generate data.If the user passes a suite directory after /autodialectics:benchmark, use it as the benchmark suite path.