Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.
v2.1.11 Sprint β FR-β2. Wraps evals/runner.js with input validation,
result persistence, and structured reporting. Replaces the bare node evals/runner.js <skill> invocation that previously required users to
remember argv structure and ignored timeout / sandbox concerns.
Arguments
Argument
Description
Example
run <skill>
Execute the eval suite for one skill
/bkit-evals run gap-detector
list
List all skills that have an eval.yaml definition
/bkit-evals list
If no argument is provided, render the same output as list.
Behavior
run <skill>
Validate skill against /^[a-z][a-z0-9-]{0,63}$/. Reject anything else
(no shell metacharacters, no slashes, no spaces) — see Security below.
Spawn node evals/runner.js --skill <skill> via child_process.spawnSync
(argv form, no shell). Default timeout 30 s, max 120 s. The --skill flag
form is mandated by the runner CLI and locked by L3 contract test.
Capture stdout / stderr. Parse the trailing JSON block via
balanced-brace fallback (string-aware).
Apply fail-closed defense: if parsed === null and stdout includes
Usage:, return reason: 'argv_format_mismatch'; if parsed === null
otherwise, return reason: 'parsed_null'. Exit code 0 alone NEVER
implies success — the parsed JSON must be present.
Persist the structured result to
.bkit/runtime/evals-{skill}-{ISO timestamp}.json with stdout/stderr
tails (2000 chars each), parsed payload, and reason field.
Render a one-line summary in the chat:
exit code
parsed pass/fail counts (if available)
path of the persisted result file
list
Read evals/config.json to enumerate skill classifications.
For each classification (workflow, capability, hybrid),
list skills that have evals/{classification}/{skill}/eval.yaml.
Render a category-grouped table with skill name + a one-line note from
the eval YAML (description field if present).
Security
Skill name regex prevents argument injection. Anything outside
[a-z][a-z0-9-]{0,63} is rejected with reason: invalid_skill_name.
argv-array spawn (no shell). No template-string concatenation into
command lines.
Result file path is composed from a hardcoded base + sanitized skill
name + timestamp; no traversal possible.
Subprocess timeout enforced (default 30 s, hard cap 120 s) so a buggy
eval cannot block the session indefinitely.
Module Dependencies
Module
Function
Usage
lib/evals/runner-wrapper.js
invokeEvals(skill, opts)
Validate + spawn + persist
lib/evals/runner-wrapper.js
isValidSkillName(name)
Regex pre-check shared with list
evals/runner.js
(subprocess)
Existing eval execution engine
Result Schema
.bkit/runtime/evals-{skill}-{timestamp}.json:
{"skill":"gap-detector","invokedAt":"<ISO 8601>","exitCode":0,"timedOut":false,"stdoutTail":"...","stderrTail":"...","parsed":{/* whatever runner.js prints as JSON, or null */}}
Examples
# Single eval
/bkit-evals run gap-detector
# Discovery
/bkit-evals list
Related
/control trust — eval results contribute to trust score
/code-review — uses eval data when assessing skills
/bkit explore (FR-β1) — explore evals as a category