Scaffold a new Harbor benchmark adapter by running `harbor adapter init` and then guide implementation using the Adapters Agent Guide as the authoritative spec.

2026-07-21

create-task

软件开发工程师

Create a new Harbor task for evaluating agents. Use when the user wants to scaffold, build, or design a new task, benchmark problem, or eval. Guides through instruction writing, environment setup, verifier design (pytest vs Reward Kit vs custom), and solution scripting.

2026-07-21

harbor-exec

软件开发工程师

Use when working with Harbor's `harbor exec` CLI workflow: compiling files, directories, or globs into Harbor tasks; running map jobs; configuring artifacts and existence-only verification; using map-reduce; writing or reviewing ExecConfig YAML/JSON/TOML; or debugging command behavior, config validation, and job outputs.

2026-07-04

rewardkit

软件开发工程师

Write Harbor task verifiers using Reward Kit. Use when creating or editing a task's tests/ directory, adding grading criteria, setting up LLM/agent judges, or designing verifiers that produce a reward score.

2026-06-25

bundled-keep

软件开发工程师

Existing task skill that should remain after job-level skill injection.

2026-05-18

runtime-proof

软件开发工程师

Write the proof file for the Harbor runtime skill injection example.

2026-05-18

publish

软件开发工程师

Publish a Harbor task or dataset to the registry. Use when the user wants to upload, publish, or share tasks or datasets/benchmarks on the Harbor registry.

2026-04-25

upload-parity-experiments

软件开发工程师

Create or reuse Hugging Face dataset PRs for `harborframework/parity-experiments` and upload Harbor parity/oracle result folders efficiently with sparse checkout, raw git pushes, and Git LFS.

2026-04-10

当前展示该仓库 Top 8 / 9 个已收集 skills。

#002

terminal-bench-science

3 个 skills205146更新于 2026-05-30

占该创作者 19%

skill

职业分类

描述

更新

convert-separate-verifier

软件开发工程师

Convert a Harbor benchmark task from Harbor's shared verifier mode (default) to separate verifier mode. Use when the user asks to "convert this task to separate verifier", "make the verifier run in its own container", or asks about Harbor's separate-verifier environment for a specific task.

2026-05-30

review-task

软件质量保证分析师与测试员

Review a benchmark task PR — downloads all artifacts, analyzes against the rubric, launches harbor view, and enables interactive review

2026-04-22

update-rubric

软件质量保证分析师与测试员

Propose new rubric criteria based on review findings — creates a branch and PR

2026-04-10

#003

skills

3 个 skills102更新于 2026-03-17

占该创作者 19%

skill

职业分类

描述

更新

harbor-adapter-creator

软件开发工程师

Create Harbor benchmark adapters that convert external benchmark datasets into Harbor task format. Use when porting an existing benchmark to Harbor, running parity experiments, registering a dataset to the Harbor registry, or debugging adapter validation failures. Covers: adapter class interface (generate_task, make_local_task_id), directory layout including YAML job configs, oracle verification, parity planning and experiments, dataset registration, and the full post-implementation workflow.

2026-03-17

harbor-cli

软件开发工程师

Harbor CLI command reference and usage patterns. Covers harbor run, harbor jobs, harbor trials, harbor datasets, harbor adapters, harbor tasks, harbor view, harbor sweeps, harbor traces, harbor cache, and harbor admin commands. Use this skill whenever running Harbor evaluations, managing datasets, viewing results, debugging tasks, exporting traces, or working with any harbor CLI command. Also use when constructing harbor command lines, looking up flag names, or troubleshooting CLI errors.

2026-03-17

harbor-task-creator

软件开发工程师

Create Harbor evaluation tasks from scratch. Generates task.toml configuration, instruction.md for agents, environment/Dockerfile setup, tests/test.sh verification scripts, and solution/solve.sh reference solutions. Use this skill whenever creating, scaffolding, or authoring new Harbor benchmark tasks, evaluation environments, or agent challenges. Also use when fixing broken tasks, debugging reward file issues, or structuring multi-container evaluation environments.

2026-03-17

#004

harbor-index

1 个 skills185更新于 2026-07-14

占该创作者 6.3%

skill

职业分类

描述

更新

harbor-index-release

软件开发工程师

Orchestrate a full Harbor Index release cut (build-push-pin, oracle gate, Hub publish, leaderboard cutover, image aliases, signed tag / GitHub Release) or a hotfix re-pin. Use when a maintainer asks to cut harbor-index vX.Y, ship a release, do a hotfix release, publish harbor-index, or run the release runbook. Does not cover post-release GPT evals or leaderboard submissions.

2026-07-14

已展示 4 / 4 个仓库

已展示全部仓库