Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

bkit-evals

Étoiles562

Forks148

Mis à jour28 avril 2026 à 14:23

Run skill evals via evals/runner.js — wrapper validates skill names, captures stdout/stderr, persists JSON results. Triggers: bkit evals, evals run, skill quality, eval runner, 스킬 평가, 評価実行, 评估运行, evaluación, évaluation.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

popup-studio-ai

popup-studio-ai/bkit-claude-code

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques·SOC 15-1253

SKILL.md

readonly

name	bkit-evals
classification	capability
classification-reason	Eval runner is a development-time quality tool, not a workflow phase
deprecation-risk	none
effort	low
description	Run skill evals via evals/runner.js — wrapper validates skill names, captures stdout/stderr, persists JSON results. Triggers: bkit evals, evals run, skill quality, eval runner, 스킬 평가, 評価実行, 评估运行, evaluación, évaluation.
argument-hint	run <skill> \| list
user-invocable	true
allowed-tools	["Bash","Read","Glob","Grep"]
imports	[]
next-skill	null
pdca-phase	null
task-template	[Evals] {action}

bkit Evals — Skill Quality Evaluation Runner

v2.1.11 Sprint β FR-β2. Wraps evals/runner.js with input validation, result persistence, and structured reporting. Replaces the bare node evals/runner.js <skill> invocation that previously required users to remember argv structure and ignored timeout / sandbox concerns.

Arguments

Argument	Description	Example
`run <skill>`	Execute the eval suite for one skill	`/bkit-evals run gap-detector`
`list`	List all skills that have an `eval.yaml` definition	`/bkit-evals list`

If no argument is provided, render the same output as list.

Behavior

`run <skill>`

Validate skill against /^[a-z][a-z0-9-]{0,63}$/. Reject anything else (no shell metacharacters, no slashes, no spaces) — see Security below.
Spawn node evals/runner.js --skill <skill> via child_process.spawnSync (argv form, no shell). Default timeout 30 s, max 120 s. The --skill flag form is mandated by the runner CLI and locked by L3 contract test.
Capture stdout / stderr. Parse the trailing JSON block via balanced-brace fallback (string-aware).
Apply fail-closed defense: if parsed === null and stdout includes Usage:, return reason: 'argv_format_mismatch'; if parsed === null otherwise, return reason: 'parsed_null'. Exit code 0 alone NEVER implies success — the parsed JSON must be present.
Persist the structured result to .bkit/runtime/evals-{skill}-{ISO timestamp}.json with stdout/stderr tails (2000 chars each), parsed payload, and reason field.
Render a one-line summary in the chat:
- exit code
- parsed pass/fail counts (if available)
- path of the persisted result file

`list`

Read evals/config.json to enumerate skill classifications.
For each classification (workflow, capability, hybrid), list skills that have evals/{classification}/{skill}/eval.yaml.
Render a category-grouped table with skill name + a one-line note from the eval YAML (description field if present).

Security

Skill name regex prevents argument injection. Anything outside [a-z][a-z0-9-]{0,63} is rejected with reason: invalid_skill_name.
argv-array spawn (no shell). No template-string concatenation into command lines.
Result file path is composed from a hardcoded base + sanitized skill name + timestamp; no traversal possible.
Subprocess timeout enforced (default 30 s, hard cap 120 s) so a buggy eval cannot block the session indefinitely.

Module Dependencies

Module	Function	Usage
`lib/evals/runner-wrapper.js`	`invokeEvals(skill, opts)`	Validate + spawn + persist
`lib/evals/runner-wrapper.js`	`isValidSkillName(name)`	Regex pre-check shared with `list`
`evals/runner.js`	(subprocess)	Existing eval execution engine

Result Schema

.bkit/runtime/evals-{skill}-{timestamp}.json:

{
  "skill": "gap-detector",
  "invokedAt": "<ISO 8601>",
  "exitCode": 0,
  "timedOut": false,
  "stdoutTail": "...",
  "stderrTail": "...",
  "parsed": { /* whatever runner.js prints as JSON, or null */ }
}

Examples

# Single eval
/bkit-evals run gap-detector

# Discovery
/bkit-evals list

/control trust — eval results contribute to trust score
/code-review — uses eval data when assessing skills
/bkit explore (FR-β1) — explore evals as a category

ARGUMENTS:

Plus depuis ce dépôt

même dépôt

sprint

popup-studio-ai/bkit-claude-code

Sprint Management — generic sprint capability for ANY bkit user. 16 sub-actions: init, start, status, watch, phase, iterate, qa, report, archive, list, feature, pause, resume, fork, help, master-plan. Triggers: sprint, sprint start, sprint init, sprint status, sprint list, 스프린트, 스프린트 시작, 스프린트 상태, スプリント, スプリント開始, スプリント状態, 冲刺, 冲刺开始, 冲刺状态, sprint, iniciar sprint, estado sprint, sprint, demarrer sprint, statut sprint, Sprint, Sprint starten, Sprint Status, sprint, avviare sprint, stato sprint, master plan, multi-sprint plan, sprint master plan, 마스터 플랜, 멀티 스프린트 계획, 스프린트 마스터 플랜, マスタープラン, マルチスプリント計画, スプリントマスタープラン, 主计划, 多冲刺计划, 冲刺主计划, plan maestro, plan multi-sprint, plan maestro sprint, plan maître, plan multi-sprint, plan maître sprint, Masterplan, Multi-Sprint-Plan, Sprint-Masterplan, piano principale, piano multi-sprint, piano principale sprint.

2026-05-21562

cc-version-analysis

popup-studio-ai/bkit-claude-code

CC CLI version upgrade impact analysis — research changes, analyze bkit impact, generate report. Triggers: cc-version-analysis, CC upgrade, version analysis, CC 버전 분석, 버전 영향.

2026-05-20562

audit

popup-studio-ai/bkit-claude-code

View audit logs, decision traces, and session history for AI transparency. ACTION_TYPES (19 entries) include PDCA events (phase_transition, gate_passed/failed, agent_spawned/completed/failed, rollback_executed, destructive_blocked) and Sprint events (sprint_paused, sprint_resumed, master_plan_created — v2.1.13). Triggers: audit, log, decision trace, history, 감사 로그, 결정 추적.

2026-05-12562

bkit-rules

popup-studio-ai/bkit-claude-code

Core rules for bkit — PDCA methodology, level detection, agent triggering, quality standards, Sprint management (8-phase container with 4 auto-pause triggers, v2.1.13), and Trust Level scope (L0-L4 gates PDCA + Sprint auto-run). Triggers: bkit rules, core rules, methodology, 핵심 규칙, PDCA 규칙.

2026-05-12562

bkit-templates

popup-studio-ai/bkit-claude-code

PDCA + Sprint document templates — Plan, Design, Analysis, Report for individual features plus templates/sprint/{master-plan, prd, plan, design, iterate, qa, report}.template.md for sprint-level documents (v2.1.13). Triggers: template, plan document, design template, 템플릿, 문서 양식.

2026-05-12562

development-pipeline

popup-studio-ai/bkit-claude-code

Complete 9-phase development pipeline guide — from schema to deployment. Pipeline phases (1-schema → 9-deployment) are orthogonal to PDCA's 9-phase per-feature cycle and Sprint's 8-phase container; each pipeline phase may host PDCA cycles for individual features, and multi-feature pipeline initiatives can be wrapped in /sprint (v2.1.13). Triggers: development pipeline, where to start, phase, 개발 파이프라인, 순서, 시작.

2026-05-12562

name	bkit-evals
classification	capability
classification-reason	Eval runner is a development-time quality tool, not a workflow phase
deprecation-risk	none
effort	low
description	Run skill evals via evals/runner.js — wrapper validates skill names, captures stdout/stderr, persists JSON results. Triggers: bkit evals, evals run, skill quality, eval runner, 스킬 평가, 評価実行, 评估运行, evaluación, évaluation.
argument-hint	run <skill> \| list
user-invocable	true
allowed-tools	["Bash","Read","Glob","Grep"]
imports	[]
next-skill	null
pdca-phase	null
task-template	[Evals] {action}

bkit Evals — Skill Quality Evaluation Runner

v2.1.11 Sprint β FR-β2. Wraps evals/runner.js with input validation, result persistence, and structured reporting. Replaces the bare node evals/runner.js <skill> invocation that previously required users to remember argv structure and ignored timeout / sandbox concerns.

Arguments

Argument	Description	Example
`run <skill>`	Execute the eval suite for one skill	`/bkit-evals run gap-detector`
`list`	List all skills that have an `eval.yaml` definition	`/bkit-evals list`

If no argument is provided, render the same output as list.

Behavior

`run <skill>`

Validate skill against /^[a-z][a-z0-9-]{0,63}$/. Reject anything else (no shell metacharacters, no slashes, no spaces) — see Security below.
Spawn node evals/runner.js --skill <skill> via child_process.spawnSync (argv form, no shell). Default timeout 30 s, max 120 s. The --skill flag form is mandated by the runner CLI and locked by L3 contract test.
Capture stdout / stderr. Parse the trailing JSON block via balanced-brace fallback (string-aware).
Apply fail-closed defense: if parsed === null and stdout includes Usage:, return reason: 'argv_format_mismatch'; if parsed === null otherwise, return reason: 'parsed_null'. Exit code 0 alone NEVER implies success — the parsed JSON must be present.
Persist the structured result to .bkit/runtime/evals-{skill}-{ISO timestamp}.json with stdout/stderr tails (2000 chars each), parsed payload, and reason field.
Render a one-line summary in the chat:
- exit code
- parsed pass/fail counts (if available)
- path of the persisted result file

`list`

Read evals/config.json to enumerate skill classifications.
For each classification (workflow, capability, hybrid), list skills that have evals/{classification}/{skill}/eval.yaml.
Render a category-grouped table with skill name + a one-line note from the eval YAML (description field if present).

Security

Skill name regex prevents argument injection. Anything outside [a-z][a-z0-9-]{0,63} is rejected with reason: invalid_skill_name.
argv-array spawn (no shell). No template-string concatenation into command lines.
Result file path is composed from a hardcoded base + sanitized skill name + timestamp; no traversal possible.
Subprocess timeout enforced (default 30 s, hard cap 120 s) so a buggy eval cannot block the session indefinitely.

Module Dependencies

Module	Function	Usage
`lib/evals/runner-wrapper.js`	`invokeEvals(skill, opts)`	Validate + spawn + persist
`lib/evals/runner-wrapper.js`	`isValidSkillName(name)`	Regex pre-check shared with `list`
`evals/runner.js`	(subprocess)	Existing eval execution engine

Result Schema

.bkit/runtime/evals-{skill}-{timestamp}.json:

{
  "skill": "gap-detector",
  "invokedAt": "<ISO 8601>",
  "exitCode": 0,
  "timedOut": false,
  "stdoutTail": "...",
  "stderrTail": "...",
  "parsed": { /* whatever runner.js prints as JSON, or null */ }
}

Examples

# Single eval
/bkit-evals run gap-detector

# Discovery
/bkit-evals list

/control trust — eval results contribute to trust score
/code-review — uses eval data when assessing skills
/bkit explore (FR-β1) — explore evals as a category

ARGUMENTS:

bkit-evals

bkit Evals — Skill Quality Evaluation Runner

Arguments

Behavior

`run <skill>`

`list`

Security

Module Dependencies

Result Schema

Examples

Related

bkit Evals — Skill Quality Evaluation Runner

Arguments

Behavior

`run <skill>`

`list`

Security

Module Dependencies

Result Schema

Examples

Related