Run any Skill in Manus with one click

create-agent-eval

Stars0

Forks0

UpdatedMay 6, 2026 at 03:16

Create an AgentEvals-style eval suite for a named agent under .agents/<agent-name>/.evals/. Use this whenever the user asks for an eval, regression suite, benchmark, or repeatable acceptance test for a workspace agent. Prefer it even when the user asks for a smoke test or acceptance check without naming AgentEvals directly.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

Tyler-R-Kendrick

Tyler-R-Kendrick/agent_harness

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations·SOC 15-1253

File Explorer

7 files

SKILL.md

readonly

name	create-agent-eval
description	Create an AgentEvals-style eval suite for a named agent under .agents/<agent-name>/.evals/. Use this whenever the user asks for an eval, regression suite, benchmark, or repeatable acceptance test for a workspace agent. Prefer it even when the user asks for a smoke test or acceptance check without naming AgentEvals directly.
license	MIT
metadata	{"version":"1.1.0"}

Create Agent Eval

Use this skill to create a compact eval suite beside a workspace agent without improvising the YAML structure.

Steps

Capture the eval objective, user-facing success criteria, and likely failure modes.
Normalize the target agent name and eval file name to lowercase kebab-case.
Use scripts/scaffold-agent-eval.ts as the deterministic source of truth for the output path and YAML skeleton.
Fill the generated scaffold with concrete cases and assertions that are easy to verify and hard to misread.

Rules

Create eval suites under .agents/<agent-name>/.evals/.
Use one YAML file per eval suite.
Prefer short case descriptions, stable ids, and objective assertions.

References

Read references/eval-schema.md for the canonical YAML shape and naming conventions.
Read references/assertion-patterns.md when choosing assertion types and writing unambiguous cases.

Deterministic output

scripts/scaffold-agent-eval.ts defines the normalized output path and starter YAML. Use it whenever the request is mostly structural so your effort goes into the actual cases instead of retyping the boilerplate.

More from this repository

same repository

research-report

Tyler-R-Kendrick/agent_harness

Runs scoped research as a composite skill through the shared skill registry and router.

2026-05-140

research-experimenter

Tyler-R-Kendrick/agent_harness

Create and maintain paper research packets in this repo with the required research/<paper>/ layout, architecture docs, and experiment implementations aligned to the agent-browser TypeScript stack. Use this whenever the user asks to add a paper, summarize research, create experiment plans, or implement a paper capability as a reference architecture.

2026-05-140

deep-research-harness

Tyler-R-Kendrick/agent_harness

ARIS-inspired deep research workflow with adversarial executor/reviewer loops, persistent memory, and claim-evidence assurance checks.

2026-05-140

webapp-testing

Tyler-R-Kendrick/agent_harness

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

2026-05-120

agent-browser

Tyler-R-Kendrick/agent_harness

Operating guide for the agent-browser workspace shell. Use this whenever the user asks how to inspect, navigate, or modify the active agent-browser workspace, Files surface, browser pages, sessions, clipboard history, render panes, or WebMCP tool flows. Prefer it before improvising tool chains because the active workspace, workspace files, and mounted session drives have specific semantics in this project.

2026-05-060

create-agent

Tyler-R-Kendrick/agent_harness

Create a scoped agent folder with an AGENTS.md file under .agents/<agent-name>/. Use this whenever the user asks for a new agent, reusable agent instructions, a workspace-scoped AGENTS.md, or a named automation persona inside the current workspace. Prefer it even when the user only describes the role and not the file layout.

2026-05-060

name	create-agent-eval
description	Create an AgentEvals-style eval suite for a named agent under .agents/<agent-name>/.evals/. Use this whenever the user asks for an eval, regression suite, benchmark, or repeatable acceptance test for a workspace agent. Prefer it even when the user asks for a smoke test or acceptance check without naming AgentEvals directly.
license	MIT
metadata	{"version":"1.1.0"}

Create Agent Eval

Use this skill to create a compact eval suite beside a workspace agent without improvising the YAML structure.

Steps

Capture the eval objective, user-facing success criteria, and likely failure modes.
Normalize the target agent name and eval file name to lowercase kebab-case.
Use scripts/scaffold-agent-eval.ts as the deterministic source of truth for the output path and YAML skeleton.
Fill the generated scaffold with concrete cases and assertions that are easy to verify and hard to misread.

Rules

Create eval suites under .agents/<agent-name>/.evals/.
Use one YAML file per eval suite.
Prefer short case descriptions, stable ids, and objective assertions.

References

Read references/eval-schema.md for the canonical YAML shape and naming conventions.
Read references/assertion-patterns.md when choosing assertion types and writing unambiguous cases.