تشغيل أي مهارة في Manus بنقرة واحدة

ابدأ الآن

team-shinchan-eval

النجوم٨

التفرعات٢

آخر تحديث٤ يونيو ٢٠٢٦ في ٠٧:٤٣

Use when you need to view agent evaluation history or detect performance regressions.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

seokan-jeong

seokan-jeong/team-shinchan

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات·SOC 15-1253

SKILL.md

readonly

name	team-shinchan:eval
description	Use when you need to view agent evaluation history or detect performance regressions.
user-invocable	false

Eval Skill

View agent evaluations, detect regressions, and compare performance.

Usage

/team-shinchan:eval                     # All agents summary
/team-shinchan:eval --agent bo          # Single agent detail
/team-shinchan:eval --regression        # Regression report only
/team-shinchan:eval --compare           # Side-by-side comparison

Arguments

Arg	Default	Description
`--agent {name}`	(all)	Show evaluation for a specific agent
`--regression`	false	Show only agents with detected regressions
`--compare`	false	Side-by-side comparison of all agents

Process

Step 1: Run Regression Detection

Execute node src/regression-detect.js .shinchan-docs/eval-history.jsonl --format table

If --agent is provided, add --agent {name}.

If file does not exist or is empty:

No evaluation history found.
Evaluations are recorded automatically during auto-retrospective.

Step 2: Display Results

Default (all agents):

Evaluation Summary
  Agent       | Evals | Correctness | Efficiency | Compliance | Quality
  bo          |    12 |     4.2     |    4.5     |    4.0     |   4.3
  aichan      |     8 |     4.0     |    3.8     |    4.2     |   4.1
  ...

--agent (single): Show full history with trend arrows and latest notes.

--regression: Filter to only agents where has_regression is true. Show dimension, latest score, moving average, and delta.

--compare:

Agent Comparison (last 5 evaluations)
  Dimension    | bo   | aichan | buriburi | masao
  correctness  | 4.2  | 4.0    | 3.8   | 4.5
  efficiency   | 4.5  | 3.8    | 4.2   | 4.0
  compliance   | 4.0  | 4.2    | 4.0   | 3.9
  quality      | 4.3  | 4.1    | 4.3   | 4.2

Step 3: Warnings

If any regressions detected, display:

!! Regression detected for {agent} in {dimension}
   Latest: {score} | Avg: {avg} | Delta: {delta}
   Action: Review recent {agent} outputs and adjust prompts.

Important

Eval history: .shinchan-docs/eval-history.jsonl
Detection script: src/regression-detect.js
Dimensions: correctness, efficiency, compliance, quality (1-5 scale)
Moving average window: last 5 evaluations per agent

المزيد من هذا المستودع

نفس المستودع

team-shinchan-bigproject

seokan-jeong/team-shinchan

Use when you have a large-scale, multi-phase project requiring orchestrated execution.

2026-06-238

team-shinchan-ralph

seokan-jeong/team-shinchan

Use when you need persistent looping until a task is fully complete.

2026-06-238

team-shinchan-fierce-review

seokan-jeong/team-shinchan

Deterministic adversarial code review for high-stakes scope — independent per-dimension review, a non-skippable per-finding refutation, completeness + interaction critics, and a deterministic 3-lens rubric judge panel. Opt-in main-loop Workflow tier.

2026-06-238

team-shinchan-skill-feedback

seokan-jeong/team-shinchan

Use when the user wants to review accumulated skill feedback, verdict trends, or improvement candidates collected during Stage 4 retrospectives. Trigger on "show skill feedback", "스킬 피드백 보여줘", or finding which skills need /writing-skills work.

2026-06-198

team-shinchan-fierce-compete

seokan-jeong/team-shinchan

Deterministic competitive code tournament — N builders independently solve one task and return patches, an Action-Kamen judge scores them head-to-head, the winner is picked by score and applied. Opt-in main-loop Workflow tier.

2026-06-198

team-shinchan-fierce-debate

seokan-jeong/team-shinchan

Deterministic adversarial debate for high-stakes or irreversible decisions — mandatory refutation plus a scored judge panel. Opt-in main-loop Workflow tier.

2026-06-198

name	team-shinchan:eval
description	Use when you need to view agent evaluation history or detect performance regressions.
user-invocable	false

Eval Skill

View agent evaluations, detect regressions, and compare performance.

Usage

/team-shinchan:eval                     # All agents summary
/team-shinchan:eval --agent bo          # Single agent detail
/team-shinchan:eval --regression        # Regression report only
/team-shinchan:eval --compare           # Side-by-side comparison

Arguments

Arg	Default	Description
`--agent {name}`	(all)	Show evaluation for a specific agent
`--regression`	false	Show only agents with detected regressions
`--compare`	false	Side-by-side comparison of all agents

Process

Step 1: Run Regression Detection

Execute node src/regression-detect.js .shinchan-docs/eval-history.jsonl --format table

If --agent is provided, add --agent {name}.

If file does not exist or is empty:

No evaluation history found.
Evaluations are recorded automatically during auto-retrospective.

Step 2: Display Results

Default (all agents):

Evaluation Summary
  Agent       | Evals | Correctness | Efficiency | Compliance | Quality
  bo          |    12 |     4.2     |    4.5     |    4.0     |   4.3
  aichan      |     8 |     4.0     |    3.8     |    4.2     |   4.1
  ...

--agent (single): Show full history with trend arrows and latest notes.

--regression: Filter to only agents where has_regression is true. Show dimension, latest score, moving average, and delta.

--compare:

Agent Comparison (last 5 evaluations)
  Dimension    | bo   | aichan | buriburi | masao
  correctness  | 4.2  | 4.0    | 3.8   | 4.5
  efficiency   | 4.5  | 3.8    | 4.2   | 4.0
  compliance   | 4.0  | 4.2    | 4.0   | 3.9
  quality      | 4.3  | 4.1    | 4.3   | 4.2

Step 3: Warnings

If any regressions detected, display:

!! Regression detected for {agent} in {dimension}
   Latest: {score} | Avg: {avg} | Delta: {delta}
   Action: Review recent {agent} outputs and adjust prompts.

Important

Eval history: .shinchan-docs/eval-history.jsonl
Detection script: src/regression-detect.js
Dimensions: correctness, efficiency, compliance, quality (1-5 scale)
Moving average window: last 5 evaluations per agent