| name | prepare-submission-workflow |
| description | Prepare an evaluation for PR submission as an entry to the register. Use when user asks to prepare an eval for submission or finalize a PR. Trigger when the user asks you to run the "Prepare Evaluation For Submission" workflow. |
Prepare Eval For Submission
Since May 2026, new evaluations are submitted as entries to the register — the evaluation code lives in your own upstream repository, and you add a pointer to it here. Code is no longer added directly to src/inspect_evals/. If the user appears to be submitting evaluation code into the repo, direct them to register/README.md for the full process.
Workflow Steps
To prepare an evaluation for submission as a pull request:
1. Verify upstream repo requirements
The upstream repo must:
- Have a
pyproject.toml with a [project] table so it can be installed via uv sync
- Declare
inspect_ai as a dependency
- Define each task with the
@task decorator from inspect_ai
Ask the user whether their upstream repo meets these requirements. Offer to check for them — if they provide the GitHub repository URL, fetch the repo's pyproject.toml and task files (e.g. via WebFetch on the raw GitHub URLs) to verify the requirements are met. If any requirement is not met, tell the user what needs to be fixed upstream before they can register.
2. Gather information and create register/<eval_name>/eval.yaml
Skip this step if register/<eval_name>/eval.yaml already exists.
Use register/example_eval.yaml as the template — it documents every field. Don't ask the user field-by-field; instead, derive what you can from the upstream repo first, then ask one batched question for what's missing.
Hints on what to derive from the upstream repo (don't ask):
source.repository_url — from step 1.
source.repository_commit — fetch the latest commit SHA on the default branch (must be a 40-char SHA, not a tag or branch).
tasks[].name and tasks[].task_path — locate every @task-decorated function in the repo and record the function name and file path.
title — from the upstream README heading or pyproject.toml [project].name.
description — draft from the upstream README; keep to one short paragraph since the generated README links back upstream.
source.maintainers — defaults to the repo owner; only override if the repo is org-owned and the real maintainers are individuals.
tags — propose based on the eval's domain (e.g. Coding, games, tools). The upstream repo name is added automatically, so don't include it.
Use register/example_eval.yaml to determine what additional questions are needed.
Show the user the drafted YAML for confirmation before writing the file. Do not set id — it is auto-injected from the directory name.
3. Run validation
make check
This validates the eval.yaml and auto-generates a README.md next to it. The README is fully generated from eval.yaml — do not edit it by hand.
Because the generated page defers to upstream for details, make sure the upstream repo's README covers the dataset, scorer, task parameters, and how the eval was validated.
4. Create a changelog fragment
uv run scriv create
5. Open a PR
Use the PR template. The reviewer will ping anyone listed under source.maintainers for acknowledgement before merging.