con un clic
create-adapter
// Scaffold a new Harbor benchmark adapter by running `harbor adapter init` and then guide implementation using the Adapters Agent Guide as the authoritative spec.
// Scaffold a new Harbor benchmark adapter by running `harbor adapter init` and then guide implementation using the Adapters Agent Guide as the authoritative spec.
Create a new Harbor task for evaluating agents. Use when the user wants to scaffold, build, or design a new task, benchmark problem, or eval. Guides through instruction writing, environment setup, verifier design (pytest vs Reward Kit vs custom), and solution scripting.
Write Harbor task verifiers using Reward Kit. Use when creating or editing a task's tests/ directory, adding grading criteria, setting up LLM/agent judges, or designing verifiers that produce a reward score.
Existing task skill that should remain after job-level skill injection.
Write the proof file for the Harbor runtime skill injection example.
Publish a Harbor task or dataset to the registry. Use when the user wants to upload, publish, or share tasks or datasets/benchmarks on the Harbor registry.
Create or reuse Hugging Face dataset PRs for `harborframework/parity-experiments` and upload Harbor parity/oracle result folders efficiently with sparse checkout, raw git pushes, and Git LFS.
| name | create-adapter |
| description | Scaffold a new Harbor benchmark adapter by running `harbor adapter init` and then guide implementation using the Adapters Agent Guide as the authoritative spec. |
Bootstrap a new benchmark adapter in the Harbor repository. This skill scaffolds the adapter directory with harbor adapter init and then defers to the adapter tutorial for every implementation decision.
The adapter tutorial is the authoritative specification for this skill. Read it in full before taking any action beyond scaffolding:
docs/content/docs/datasets/adapters.mdx
That path is relative to the Harbor repo root (a skill prerequisite — see below). The tutorial contains:
task.toml, parity_experiment.json, adapter_metadata.json, and dataset.toml.Do not substitute prior knowledge for the contents of that file. Treat it as the contract.
PATH (harbor --version succeeds).git fetch origin && git status and pull main if behind. Stale checkouts miss recent adapter and agent fixes and are a common source of spurious parity failures later on.Before any filesystem or CLI action, Read docs/content/docs/datasets/adapters.mdx in full. Pay particular attention to:
name field and <org>/<task> format requirements.Collect the following before scaffolding. If the user has not provided an item, ask before proceeding — these inputs shape the scaffold and the tutorial steps that follow.
| Field | Why it matters |
|---|---|
| Adapter name | Lowercase, hyphen-separated. Must match the benchmark's common identifier (e.g., swe-bench, aider-polyglot). Becomes the directory under adapters/ and, after underscore conversion, the Python package name. |
| Human-readable name | Passed via --name; appears in the generated README. |
| Upstream repo URL | Needed for tutorial Step 1 (benchmark analysis) and for the original_parity_repo field later. |
| Oracle solutions available? | If the benchmark ships reference solutions, use them. If not, oracle solutions must be LLM-generated (tutorial Step 3 → "Benchmarks without oracle solutions"). |
| Agent scenario | Which of Step 4's scenarios applies: (1) existing compatible agent, (2) fork + add LLM agent, or (3) custom agent. This shapes the parity plan. |
Per tutorial Step 2, work on a dedicated branch:
git checkout -b <adapter-name>-adapter
Prefer the non-interactive form when both names are known:
harbor adapter init <adapter-name> --name "<Human-Readable Name>"
Otherwise run harbor adapter init interactively and let the CLI prompt.
Expected output: a new directory at adapters/<adapter-name>/ containing pyproject.toml, README.md, src/<adapter_name>/, and the template task files under src/<adapter_name>/task-template/. Verify the directory exists before continuing.
Continue from "Step 1. Understand the Original Benchmark" in the tutorial. Do not invent structure, field names, or workflow beyond what the guide specifies.
High-priority gotchas (each is documented in the tutorial, but these are the most common adapter-build failures — surface them proactively as you work through the steps):
task.toml must contain a name field under [task]. main.py is responsible for deriving a sanitized, unique, registry-safe name for every task. Tasks without a name cannot be registered. See the tutorial's "Naming rules" table.{dataset}-1, {dataset}-2) from a reproducible sort.version = "1.0" in task.toml is the schema version — leave it alone. Dataset versions are publish-time tags requested in the PR description, not a field in task.toml or dataset.toml.main.py must support --output-dir, --limit, --overwrite, and --task-ids. These flags are required for reproducible runs and task-level debugging.README.md is parsed by downstream automation. Fill in every section exactly as the template defines; put extra context in the Notes section or in the notes fields of parity_experiment.json / adapter_metadata.json. Do not add, rename, reorder, or remove sections.When implementation questions come up, point at an existing adapter that matches the benchmark's shape:
| Scenario | Example adapter | When to use |
|---|---|---|
| Compatible agent already exists | adapters/adebench/ | Upstream already supports Claude-Code / Codex / OpenHands / Gemini-CLI |
| Fork upstream + add LLM agent | adapters/evoeval/ | LLM-based benchmark with no Harbor-compatible agent |
| Custom agent, separate dataset | adapters/bixbench/, adapters/financeagent/ | Custom interaction semantics; financeagent also demos LLM-as-a-Judge |
| Custom agent in-place | adapters/medagentbench/ | Custom HTTPAgent, no separate dataset |
| Multi-agent workflow | adapters/cooperbench/ | Multiple agents coordinate via messaging / sidecars |
| GPU tasks | adapters/featurebench/ | Comprehensive Docker + Modal GPU example |
adapter.py, main.py, or any task-template files. Those are the contributor's work, guided by the tutorial.| Symptom | Likely cause | Action |
|---|---|---|
harbor: command not found | Harbor CLI not installed | Install with uv tool install harbor or the repository's development setup. |
harbor adapter init fails with a name validation error | Adapter name contains invalid characters | Re-run with a lowercase, hyphen-separated name. |
Scaffold succeeds but adapters/<name>/ is not present | Running from outside the repository root | cd to the Harbor repository root and re-run. |
| Oracle passes locally but parity later fails spuriously | Stale Harbor checkout | git fetch origin && git status; pull main if behind. |