Name: Task Author
Author: brainqub3

name	task-author
description	Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows.
disable-model-invocation	true
allowed-tools	["Read","Edit","Bash","Glob","Grep"]

task-author

Use this skill when a new task is needed or when an existing task package is incomplete.

Goal

Produce:

brainqub3/tasks/<task>/task.md
brainqub3/tasks/<task>/instances.jsonl
brainqub3/tasks/<task>/evaluator.py
brainqub3/tasks/<task>/tests/test_evaluator.py
Optional fixtures/ files required for deterministic evaluation

Workflow

Initialize scaffold if task folder does not exist: uv run brainqub3 task init <task_name>.
Define deterministic output contract in task.md (exact JSON keys, types, and no extra keys unless explicitly allowed).
Choose data mode: fabricated task data or user-provided data.
Build instances.jsonl with stable IDs and task inputs.
Implement evaluator with explicit failure taxonomy (invalid_json, not_object, schema_mismatch, answer_mismatch, plus task-specific errors).
Add evaluator tests for pass/fail and malformed output.
Run uv run pytest brainqub3/tasks/<task>/tests -q and fix until green.
Optionally run a smoke check: uv run brainqub3 run sas --task <task_name> --instances 1 --allow-mock.

Data Modes

1) Fabricated Data Mode

Generate deterministic instances directly in instances.jsonl.
Prefer explicit gold answers when truth can be precomputed.
If truth is derived from local files/rules, store deterministic inputs under fixtures/ and derive truth in evaluator.
Include edge cases that break naive or hard-coded solutions.
Keep constants and generation logic stable for reproducibility.

2) User-Provided Data Mode

Confirm expected input source (files, folders, schema) before building instances.
Snapshot minimal required data into task-local fixtures/ when possible.
If data cannot be copied, document stable path assumptions and required layout in task.md.
Normalize user data into deterministic instances.jsonl rows with id and input.
Add evaluator checks for missing files, missing fields, malformed records, and empty data slices.

Quality Gates

task.md specifies deterministic success criteria and strict output contract.
Evaluator returns EvalResult with actionable error_type and useful details.
Tests cover at least:
- clear pass
- clear fail
- invalid JSON
- schema mismatch
- one data-source-specific failure path
uv run pytest brainqub3/tasks/<task>/tests -q passes before any SAS/MAS run.

Coordination

Use eval-builder when evaluator complexity grows or tests are brittle.
Report changed files and validation commands run.

task-author

Use this skill when a new task is needed or when an existing task package is incomplete.

Goal

Produce:

brainqub3/tasks/<task>/task.md

brainqub3/tasks/<task>/instances.jsonl

brainqub3/tasks/<task>/evaluator.py

brainqub3/tasks/<task>/tests/test_evaluator.py

Optional fixtures/ files required for deterministic evaluation

Workflow

Initialize scaffold if task folder does not exist: uv run brainqub3 task init <task_name>.

Define deterministic output contract in task.md (exact JSON keys, types, and no extra keys unless explicitly allowed).

Choose data mode: fabricated task data or user-provided data.

Build instances.jsonl with stable IDs and task inputs.

Implement evaluator with explicit failure taxonomy (invalid_json, not_object, schema_mismatch, answer_mismatch, plus task-specific errors).

Add evaluator tests for pass/fail and malformed output.

Run uv run pytest brainqub3/tasks/<task>/tests -q and fix until green.

Optionally run a smoke check: uv run brainqub3 run sas --task <task_name> --instances 1 --allow-mock.

Data Modes

1) Fabricated Data Mode

Generate deterministic instances directly in instances.jsonl.

Prefer explicit gold answers when truth can be precomputed.

If truth is derived from local files/rules, store deterministic inputs under fixtures/ and derive truth in evaluator.

Include edge cases that break naive or hard-coded solutions.

Keep constants and generation logic stable for reproducibility.

2) User-Provided Data Mode

Confirm expected input source (files, folders, schema) before building instances.

Snapshot minimal required data into task-local fixtures/ when possible.

If data cannot be copied, document stable path assumptions and required layout in task.md.

Normalize user data into deterministic instances.jsonl rows with id and input.

Add evaluator checks for missing files, missing fields, malformed records, and empty data slices.

Quality Gates

task.md specifies deterministic success criteria and strict output contract.

Evaluator returns EvalResult with actionable error_type and useful details.

Tests cover at least:

clear pass
clear fail
invalid JSON
schema mismatch
one data-source-specific failure path

uv run pytest brainqub3/tasks/<task>/tests -q passes before any SAS/MAS run.

Coordination

Use eval-builder when evaluator complexity grows or tests are brittle.

Report changed files and validation commands run.

task-author

task-author

Goal

Workflow

Data Modes

1) Fabricated Data Mode

2) User-Provided Data Mode

Quality Gates

Coordination

Mehr aus diesem Repository

Mehr aus diesem Repository

task-author

Goal

Workflow

Data Modes

1) Fabricated Data Mode

2) User-Provided Data Mode

Quality Gates

Coordination