ワンクリックでManusで任意のスキルを実行

始める

$pwd:

eval-skills

Name: Eval Skills
Author: JetBrains

// Run structured evaluations on skills to measure quality and track improvements.

Manusで実行

$ git log --oneline --stat

stars:30

forks:7

updated:2026年3月31日 17:42

SKILL.md

readonly

name	eval-skills
description	Run structured evaluations on skills to measure quality and track improvements.
argument-hint	[skill-name ...] (e.g. local-code-review review-architecture)

Eval Skills

Steps

1. Determine skills to evaluate

If names provided via $ARGUMENTS, evaluate those. Otherwise list skills with evals/evals.json files and ask user to pick (accept "all").

2. Create iteration directory

mkdir -p .claude/evals-workspace/iteration-<N>

Use next sequential number.

3. Run eval cases

For each test case in evals.json, run twice:

With skill: subagent with skill loaded, save to iteration-<N>/<skill>-<id>/with_skill/outputs/
Without skill: subagent without skill, save to iteration-<N>/<skill>-<id>/without_skill/outputs/

Each run starts with clean context.

4. Grade

Evaluate assertions against output. Save grading.json:

{
  "assertion_results": [{"text": "...", "passed": true, "evidence": "..."}],
  "summary": {"passed": 3, "failed": 1, "total": 4, "pass_rate": 0.75}
}

Require concrete evidence for every PASS.

5. Aggregate

Save iteration-<N>/benchmark.json with mean pass rates (with/without skill) and delta.

6. Present results

Show per-eval pass rates, overall delta, always-pass candidates (remove?), always-fail candidates (revise?). Save feedback to feedback.json.

Iteration loop

Update SKILL.md based on findings, run new iteration, compare benchmarks, stop when pass rates plateau.

related-skills.json

同じリポジトリ

autosteer.md

from "JetBrains/databao-cli"

Run the full development pipeline autonomously without pausing between phases. Stops only on quality-gate failures.

2026-03-3130

check-coverage.md

from "JetBrains/databao-cli"

Run test coverage measurement, analyze results, and fix gaps when coverage falls below the 80% threshold.

2026-03-3130

check-pr-comments.md

from "JetBrains/databao-cli"

Fetch unresolved PR review threads, triage them, implement fixes, validate, reply in-thread, and resolve.

2026-03-3130

create-pr.md

from "JetBrains/databao-cli"

Stage, commit, push, and open a GitHub PR following project conventions. Use when code is ready to ship.

2026-03-3130

local-code-review.md

from "JetBrains/databao-cli"

Review local code changes for correctness, regressions, missing tests, and Databao-specific risks.

2026-03-3130

make-yt-issue.md

from "JetBrains/databao-cli"

Ensure a YouTrack issue exists before starting work. Validates existing tickets or creates new ones.

2026-03-3130

package.json

"author": "JetBrains"

"repository": "JetBrains/databao-cli"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

ソフトウェア品質保証アナリスト・テスターコンピュータ・数学職15-1253L4

name	eval-skills
description	Run structured evaluations on skills to measure quality and track improvements.
argument-hint	[skill-name ...] (e.g. local-code-review review-architecture)

Eval Skills

Steps

1. Determine skills to evaluate

If names provided via $ARGUMENTS, evaluate those. Otherwise list skills with evals/evals.json files and ask user to pick (accept "all").

2. Create iteration directory

mkdir -p .claude/evals-workspace/iteration-<N>

Use next sequential number.

3. Run eval cases

For each test case in evals.json, run twice:

With skill: subagent with skill loaded, save to iteration-<N>/<skill>-<id>/with_skill/outputs/
Without skill: subagent without skill, save to iteration-<N>/<skill>-<id>/without_skill/outputs/

Each run starts with clean context.

4. Grade

Evaluate assertions against output. Save grading.json:

{
  "assertion_results": [{"text": "...", "passed": true, "evidence": "..."}],
  "summary": {"passed": 3, "failed": 1, "total": 4, "pass_rate": 0.75}
}

Require concrete evidence for every PASS.

5. Aggregate

Save iteration-<N>/benchmark.json with mean pass rates (with/without skill) and delta.

6. Present results

Show per-eval pass rates, overall delta, always-pass candidates (remove?), always-fail candidates (revise?). Save feedback to feedback.json.

Iteration loop

Update SKILL.md based on findings, run new iteration, compare benchmarks, stop when pass rates plateau.

eval-skills

Eval Skills

Steps

1. Determine skills to evaluate

2. Create iteration directory

3. Run eval cases

4. Grade

5. Aggregate

6. Present results

Iteration loop

このリポジトリの他の Skills

Eval Skills

Steps

1. Determine skills to evaluate

2. Create iteration directory

3. Run eval cases

4. Grade

5. Aggregate

6. Present results

Iteration loop

このリポジトリの他の Skills