تشغيل أي مهارة في Manus بنقرة واحدة

ابدأ الآن

$pwd:

new-eval

Name: New Eval
Author: adam-s

// Scaffold a new evaluation

تشغيل في Manus

$ git log --oneline --stat

stars:٢

forks:١

updated:٣ أبريل ٢٠٢٦ في ١٦:١٢

SKILL.md

readonly

related-skills.json

نفس المستودع

compare.md

from "adam-s/agent-spec"

Compare two eval runs and report what changed. Reads both runs' events, transcripts, and produced artifacts. Writes a short markdown summary classifying differences as regression, improvement, or neutral.

2026-04-062

iterate.md

from "adam-s/agent-spec"

Generalized recursive iteration loop. Runs parallel sub-agents against a target, scores deterministically, diagnoses instruction gaps, applies fixes, and recurses until the stop condition is met or max depth is reached.

2026-04-062

run-eval.md

from "adam-s/agent-spec"

Run an evaluation against an eval with a specific config

2026-04-062

handoff.md

from "adam-s/agent-spec"

Write a handoff document so a new chat can continue the work

2026-04-052

report.md

from "adam-s/agent-spec"

Show evaluation results and comparisons

2026-04-052

build.md

from "adam-s/agent-spec"

Test-driven development of a Hono/Bun WebSocket application. Read requirements, read tests, build server, verify, iterate until all tests pass.

2026-04-052

package.json

"author": "adam-s"

"repository": "adam-s/agent-spec"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات15-1253L4

name	new-eval
description	Scaffold a new evaluation
argument-hint	<name>

/new-eval — Create a new evaluation

Scaffolds an eval directory with template files. Uses the multi-challenge format by default.

See @.claude/reference/eval-definition.md for the full schema, environment setup conventions, and verify.sh contract.

Steps

Ask the user what language the eval targets (Python, TypeScript, or other) to select the right environment pattern.

Create these files in evals/$1/:

EVAL.md:

---
name: $1
description: Describe what this eval tests
model: claude-sonnet-4-6
budget: 2.00
---

challenges/challenge-1/prompt.md: Describe the task. Include the environment hint for the language:

Python: "A Python virtual environment is available at .venv/. Use .venv/bin/python3 to run code."
TypeScript: "Dependencies are pre-installed in node_modules/."

challenges/challenge-1/seeds/: Place seed files here (source code, test scripts, data files, and dependency manifests like requirements.txt or package.json).

challenges/challenge-1/setup.sh — create the environment and install deps:

Python:

#!/bin/bash
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt --quiet

TypeScript:

#!/bin/bash
npm install --silent

challenges/challenge-1/verify.sh — run tests and print RESULT. Only verify.sh prints RESULT:, not the test script.

Python:

#!/bin/bash
[ -d .venv ] || python3 -m venv .venv
.venv/bin/pip install -r requirements.txt --quiet 2>/dev/null

.venv/bin/python3 test.py 2>&1
EXIT=$?

if [ $EXIT -eq 0 ]; then
    echo "RESULT: PASS"
else
    echo "RESULT: FAIL"
fi

TypeScript:

#!/bin/bash
[ -d node_modules ] || npm install --silent 2>/dev/null

node test.js 2>&1
EXIT=$?

if [ $EXIT -eq 0 ]; then
    echo "RESULT: PASS"
else
    echo "RESULT: FAIL"
fi

configs/baseline/CLAUDE.md:

A coding project.

configs/baseline/settings.json:

{
  "permissions": {
    "deny": []
  }
}

Make scripts executable: chmod +x evals/$1/challenges/challenge-1/verify.sh evals/$1/challenges/challenge-1/setup.sh

Print: "Eval '$1' created. Edit the challenge prompt and seeds, then run /run-eval $1"

new-eval

المزيد من هذا المستودع

المزيد من هذا المستودع

/new-eval — Create a new evaluation

Steps

/new-eval — Create a new evaluation

Steps