with one click
eval-builder
// Build or repair task evaluators and evaluator tests with deterministic-first strategy.
// Build or repair task evaluators and evaluator tests with deterministic-first strategy.
Create and execute scenario YAML files for architecture what-if predictions.
Prepare and run evaluator-gated SAS and MAS experiments with explicit batch control and mandatory elasticity calibration for scenario scaling.
Reset local experiment state to a fresh slate by clearing runs, prediction outputs, scaling snapshots, and database index files under data/.
Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows.
Generate markdown reports with observed metrics, predicted metrics, and architecture recommendations.
| name | eval-builder |
| description | Build or repair task evaluators and evaluator tests with deterministic-first strategy. |
| disable-model-invocation | true |
| allowed-tools | ["Read","Edit","Bash","Glob","Grep"] |
Use this skill when evaluator is missing, incomplete, or failing tests.
Produce:
brainqub3/tasks/<task>/evaluator.pybrainqub3/tasks/<task>/tests/test_evaluator.pytask.md output contracttask.md and instances.jsonlpytest brainqub3/tasks/<task>/tests -q