mit einem Klick
task-author
// Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows.
// Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows.
Create and execute scenario YAML files for architecture what-if predictions.
Prepare and run evaluator-gated SAS and MAS experiments with explicit batch control and mandatory elasticity calibration for scenario scaling.
Reset local experiment state to a fresh slate by clearing runs, prediction outputs, scaling snapshots, and database index files under data/.
Build or repair task evaluators and evaluator tests with deterministic-first strategy.
Generate markdown reports with observed metrics, predicted metrics, and architecture recommendations.
| name | task-author |
| description | Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows. |
| disable-model-invocation | true |
| allowed-tools | ["Read","Edit","Bash","Glob","Grep"] |
Use this skill when a new task is needed or when an existing task package is incomplete.
Produce:
brainqub3/tasks/<task>/task.mdbrainqub3/tasks/<task>/instances.jsonlbrainqub3/tasks/<task>/evaluator.pybrainqub3/tasks/<task>/tests/test_evaluator.pyfixtures/ files required for deterministic evaluationuv run brainqub3 task init <task_name>.task.md (exact JSON keys, types, and no extra keys unless explicitly allowed).instances.jsonl with stable IDs and task inputs.invalid_json, not_object, schema_mismatch, answer_mismatch, plus task-specific errors).uv run pytest brainqub3/tasks/<task>/tests -q and fix until green.uv run brainqub3 run sas --task <task_name> --instances 1 --allow-mock.instances.jsonl.gold answers when truth can be precomputed.fixtures/ and derive truth in evaluator.fixtures/ when possible.task.md.instances.jsonl rows with id and input.task.md specifies deterministic success criteria and strict output contract.EvalResult with actionable error_type and useful details.uv run pytest brainqub3/tasks/<task>/tests -q passes before any SAS/MAS run.eval-builder when evaluator complexity grows or tests are brittle.