一键在 Manus 中运行任何 Skill

$pwd:

rewardkit

Name: Rewardkit
Author: harbor-framework

// Write Harbor task verifiers using Reward Kit. Use when creating or editing a task's tests/ directory, adding grading criteria, setting up LLM/agent judges, or designing verifiers that produce a reward score.

在 Manus 中运行

$ git log --oneline --stat

stars:2,205

forks:1,090

updated:2026年5月30日 18:33

SKILL.md

readonly

related-skills.json

同仓库

create-task.md

from "harbor-framework/harbor"

Create a new Harbor task for evaluating agents. Use when the user wants to scaffold, build, or design a new task, benchmark problem, or eval. Guides through instruction writing, environment setup, verifier design (pytest vs Reward Kit vs custom), and solution scripting.

2026-05-302.2k

bundled-keep.md

from "harbor-framework/harbor"

Existing task skill that should remain after job-level skill injection.

2026-05-182.2k

runtime-proof.md

from "harbor-framework/harbor"

Write the proof file for the Harbor runtime skill injection example.

2026-05-182.2k

publish.md

from "harbor-framework/harbor"

Publish a Harbor task or dataset to the registry. Use when the user wants to upload, publish, or share tasks or datasets/benchmarks on the Harbor registry.

2026-04-252.2k

create-adapter.md

from "harbor-framework/harbor"

Scaffold a new Harbor benchmark adapter by running `harbor adapter init` and then guide implementation using the Adapters Agent Guide as the authoritative spec.

2026-04-192.2k

upload-parity-experiments.md

from "harbor-framework/harbor"

Create or reuse Hugging Face dataset PRs for `harborframework/parity-experiments` and upload Harbor parity/oracle result folders efficiently with sparse checkout, raw git pushes, and Git LFS.

2026-04-102.2k

package.json

"author": "harbor-framework"

"repository": "harbor-framework/harbor"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件质量保证分析师与测试员计算机与数学类职业15-1253L4

[environment] network_mode = "no-network" # Agent env baseline — offline during agent.run() [verifier] environment_mode = "separate" [verifier.environment] network_mode = "public" # Verifier env baseline — LLM judge API calls docker_image = "python:3.12-slim"

@criterion(description="output has at least {n} lines") def has_n_lines(workspace: Path, n: int) -> bool: return len((workspace / "output.txt").read_text().splitlines()) >= n rk.has_n_lines(10, weight=2.0) rk.has_n_lines(50, weight=1.0)

[judge] judge = "anthropic/claude-sonnet-4-6" # LiteLLM model string files = ["/app/main.py"] [[criterion]] description = "Is the code correct?" type = "binary" [[criterion]] description = "How readable is the code?" type = "likert" points = 5 weight = 2.0

rewardkit

Setup in a Harbor task

Programmatic criteria

Available built-ins

Custom criteria

Judge criteria (LLM or agent-as-a-judge)

Agent judges

Useful `[judge]` options

Scoring aggregation

Multi-reward tasks

Output files

Multi-step tasks

When to reach for what

Working example

Setup in a Harbor task

Programmatic criteria

Available built-ins

Custom criteria

Judge criteria (LLM or agent-as-a-judge)

Agent judges

Useful `[judge]` options

Scoring aggregation

Multi-reward tasks

Output files

Multi-step tasks

When to reach for what

Working example

name	rewardkit
description	Write Harbor task verifiers using Reward Kit. Use when creating or editing a task's tests/ directory, adding grading criteria, setting up LLM/agent judges, or designing verifiers that produce a reward score.

rewardkit

同仓库更多 Skills

Setup in a Harbor task

Programmatic criteria

Available built-ins

Custom criteria

Judge criteria (LLM or agent-as-a-judge)

Agent judges

Useful [judge] options

Scoring aggregation

Multi-reward tasks

Output files

Multi-step tasks

When to reach for what

Working example

Setup in a Harbor task

Programmatic criteria

Available built-ins

Custom criteria

Judge criteria (LLM or agent-as-a-judge)

Agent judges

Useful [judge] options

Scoring aggregation

Multi-reward tasks

Output files

Multi-step tasks

When to reach for what

Working example

同仓库更多 Skills

Useful `[judge]` options

Useful `[judge]` options