Run any Skill in Manus with one click

$pwd:

create-task

Name: Create Task
Author: harbor-framework

// Create a new Harbor task for evaluating agents. Use when the user wants to scaffold, build, or design a new task, benchmark problem, or eval. Guides through instruction writing, environment setup, verifier design (pytest vs Reward Kit vs custom), and solution scripting.

Run Skill in Manus

$ git log --oneline --stat

stars:2,205

forks:1,090

updated:May 30, 2026 at 18:33

SKILL.md

readonly

related-skills.json

same repository

rewardkit.md

from "harbor-framework/harbor"

Write Harbor task verifiers using Reward Kit. Use when creating or editing a task's tests/ directory, adding grading criteria, setting up LLM/agent judges, or designing verifiers that produce a reward score.

2026-05-302.2k

bundled-keep.md

from "harbor-framework/harbor"

Existing task skill that should remain after job-level skill injection.

2026-05-182.2k

runtime-proof.md

from "harbor-framework/harbor"

Write the proof file for the Harbor runtime skill injection example.

2026-05-182.2k

publish.md

from "harbor-framework/harbor"

Publish a Harbor task or dataset to the registry. Use when the user wants to upload, publish, or share tasks or datasets/benchmarks on the Harbor registry.

2026-04-252.2k

create-adapter.md

from "harbor-framework/harbor"

Scaffold a new Harbor benchmark adapter by running `harbor adapter init` and then guide implementation using the Adapters Agent Guide as the authoritative spec.

2026-04-192.2k

upload-parity-experiments.md

from "harbor-framework/harbor"

Create or reuse Hugging Face dataset PRs for `harborframework/parity-experiments` and upload Harbor parity/oracle result folders efficiently with sparse checkout, raw git pushes, and Git LFS.

2026-04-102.2k

package.json

"author": "harbor-framework"

"repository": "harbor-framework/harbor"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

<task-name>/ ├── instruction.md # Task prompt for the agent ├── task.toml # Config and metadata ├── environment/Dockerfile # Container definition ├── solution/solve.sh # Reference solution (optional) └── tests/test.sh # Verifier script

#!/bin/bash apt-get update && apt-get install -y curl curl -LsSf https://astral.sh/uv/0.9.7/install.sh | sh source $HOME/.local/bin/env uvx --with pytest==8.4.1 pytest /tests/test_outputs.py if [ $? -eq 0 ]; then echo 1 > /logs/verifier/reward.txt else echo 0 > /logs/verifier/reward.txt fi

[task] name = "<org>/<task-name>" description = "One-line description" keywords = ["jax", "mnist", "rewardkit"] # always populate — used for search/filtering [metadata] difficulty = "easy" | "medium" | "hard" category = "programming" | "machine-learning" | "gpu" | ... tags = ["..."] [agent] timeout_sec = 120.0 # How long the agent has [verifier] timeout_sec = 600.0 # How long tests have [environment] network_mode = "public" # Baseline at env start (defaults to public) cpus = 1 # CPU cores memory_mb = 2048 # RAM in MB storage_mb = 10240 # Disk in MB

Field

Layer

When applied

[environment].network_mode

Baseline

Agent env start; shared verifier uses this too

[verifier.environment].network_mode

Baseline

Separate verifier env start

[agent].network_mode, [steps.agent].network_mode

Override

During matching agent.run()

[verifier].network_mode, [steps.verifier].network_mode

Override

During matching verify()

--allow-environment-host

Run-time

Merged into environment.extra_allowed_hosts → [environment] baseline

--allow-agent-host

Run-time

Merged into agent.extra_allowed_hosts → agent phase allowlist

<task-name>/ ├── task.toml ├── environment/Dockerfile # Built once, shared across all steps ├── steps/ │ ├── scaffold/ │ │ ├── instruction.md # Prompt for this step │ │ ├── workdir/ # Uploaded to WORKDIR before the agent runs │ │ │ └── setup.sh # Optional pre-agent hook (reserved filename) │ │ ├── tests/test.sh # Per-step verifier │ │ └── solution/solve.sh # Per-step Oracle solution (optional) │ ├── implement/ │ │ └── ... │ └── document/ │ └── ... └── tests/ # Optional shared helpers + fallback test.sh

schema_version = "1.3" [task] name = "<org>/<task-name>" # How per-step rewards roll up into the trial-level verifier_result. # "mean" (default): per-key mean across steps that produced a result. # "final": the last step's verifier_result verbatim. multi_step_reward_strategy = "mean" [[steps]] name = "scaffold" # Must match the directory under steps/ min_reward = 1.0 # Abort trial if this step's reward < 1.0 [steps.agent] timeout_sec = 60.0 # Overrides task-level [agent].timeout_sec [steps.verifier] timeout_sec = 30.0 [[steps]] name = "implement" # Dict form gates on specific keys from a multi-dim reward: min_reward = { correctness = 0.8, style = 0.5 } [steps.agent] timeout_sec = 120.0 [steps.verifier] timeout_sec = 30.0 [[steps]] name = "document" [steps.agent] timeout_sec = 60.0 [steps.verifier] timeout_sec = 30.0

Field

Layer

When applied

[environment].network_mode

Baseline

Agent env start; shared verifier uses this too

[verifier.environment].network_mode

Baseline

Separate verifier env start

[agent].network_mode, [steps.agent].network_mode

Override

During matching agent.run()

[verifier].network_mode, [steps.verifier].network_mode

Override

During matching verify()

--allow-environment-host

Run-time

Merged into environment.extra_allowed_hosts → [environment] baseline

--allow-agent-host

Run-time

Merged into agent.extra_allowed_hosts → agent phase allowlist

name	create-task
description	Create a new Harbor task for evaluating agents. Use when the user wants to scaffold, build, or design a new task, benchmark problem, or eval. Guides through instruction writing, environment setup, verifier design (pytest vs Reward Kit vs custom), and solution scripting.
argument-hint	["org/task-name"]

create-task

More from this repository

More from this repository

Step 1: Scaffold the task

Step 2: Write instruction.md

Step 3: Build the environment

Step 4: Decide how to verify

Option A: Reward Kit (recommended for most cases)

Option B: pytest (good for deterministic unit-style checks)

Option C: Custom shell

Reward file format (all options)

Step 5: Write the solution

Step 6: Configure task.toml

Network policy

Step 7: Verify with the Oracle agent

Step 8: Test with a real agent (optional)

Step 9: Update README.md (always the final step)

Multi-step tasks

Directory layout

task.toml

Choosing a reward strategy

Artifacts

Oracle verification

Full reference + worked example

Special features (mention if relevant)

Common pitfalls

Step 1: Scaffold the task

Step 2: Write instruction.md

Step 3: Build the environment

Step 4: Decide how to verify

Option A: Reward Kit (recommended for most cases)

Option B: pytest (good for deterministic unit-style checks)

Option C: Custom shell

Reward file format (all options)

Step 5: Write the solution

Step 6: Configure task.toml

Network policy

Step 7: Verify with the Oracle agent

Step 8: Test with a real agent (optional)

Step 9: Update README.md (always the final step)

Multi-step tasks

Directory layout

task.toml

Choosing a reward strategy

Artifacts

Oracle verification

Full reference + worked example

Special features (mention if relevant)

Common pitfalls