Run any Skill in Manus with one click

cli-review-runner

Stars163

Forks12

UpdatedMay 27, 2026 at 12:19

Black-box CLI grading harness — runs a test suite against a target CLI and reports per-rule pass/fail from the cli-for-agents 45-rule catalog. Use when reviewing, auditing, or grading a command-line tool for agent-friendliness. Trigger even if the user doesn't explicitly say "agent-friendly" — apply whenever they ask "is mycli good for agents?", "review this CLI", "grade my cli against the rules", "check if this tool is safe to automate", or "audit command-line design". Companion to the cli-for-agents distillation skill.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

pproenca

pproenca/dot-skills

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations·SOC 15-1253

File Explorer

11 files

SKILL.md

readonly

More from this repository

same repository

write-design-docs

pproenca/dot-skills

Write, revise, or review software design documents using Malte Ubl's "Design Docs at Google" guidance. Use when Codex needs to draft a design doc, technical design, RFC, engineering review doc, architecture proposal, mini design doc, or design-doc review focused on context, goals, non-goals, trade-offs, alternatives, cross-cutting concerns, review lifecycle, and deciding whether a design doc is warranted.

2026-06-12163

go-process-cli

pproenca/dot-skills

Write Go CLIs that manage processes and workloads — daemons, supervisors, job runners, deploy tools, anything that spawns or controls other processes. Use when writing, reviewing, or refactoring Go that handles OS signals, runs children via os/exec, fans out concurrent work, threads context for cancellation, returns exit codes, or builds a flag/cobra command surface. Covers the footguns the standard library makes easy to hit — SIGTERM vs Ctrl-C, SIGKILL-by-default children, orphaned process groups, pipe deadlocks, goroutine leaks, zombie reaping, and os.FindProcess lying about liveness. Triggers whenever the task touches process lifecycle, graceful shutdown, subprocess control, or concurrent workload supervision in Go — even if not named.

2026-05-31163

design-review

pproenca/dot-skills

Structured UI design review — existing code (React/JSX, CSS, Tailwind) and, when behaviour matters, the running app in a real browser — reported as a prioritised Before / After / Why table. Covers visual hierarchy, spacing, typography, colour & contrast, component states, motion, responsiveness, accessibility, multi-page flow & navigation, and interaction continuity — grounded in Refactoring UI and Emil Kowalski's principles. For animation/jank/FPS, focus order, and cross-page UX it can drive Chrome via chrome-devtools-mcp to capture what a screenshot can't. Trigger when the user asks to "review this UI", "design review", "critique this component/screen/page or multi-page flow", asks why something "looks off", "looks AI-generated", or "looks like a wireframe", or wants to raise visual polish. For building UI from scratch use web-taste; for the full animation set see emilkowal-animations.

2026-05-27163

deterministic-metric-design

pproenca/dot-skills

Inventing deterministic metrics — turning a fuzzy property like 'maintainability', 'risk', or 'how reducible this code is' into a deterministic, computable number an agent can trust and optimize. Covers the path from construct to adoption — operationalizing the construct, confronting computability limits (Kolmogorov, Rice) with sound proxies, picking the right measurement scale, proving properties (monotonicity, invariance, the Weyuker/Briand axioms), guaranteeing determinism, establishing construct validity (not just LOC in disguise), and hardening against Goodhart-style gaming when an agent optimizes the metric. Trigger when designing, reviewing, or validating a quantitative metric, score, measure, or index — and even when the user doesn't say 'metric' but wants to quantify, score, rank, or measure code/behavior, build a deterministic optimization target, or invent a measure for something previously unquantified (e.g., behavior-preserving codebase-size reduction).

2026-05-27163

implementation-design-patterns

pproenca/dot-skills

Implementation guide for the 22 Gang of Four design patterns in TypeScript, distilled from refactoring.guru. Use this skill when writing, refactoring, or reviewing TypeScript that exhibits a pattern-shaped problem — class-explosion from inheritance, conditionals switching on type, tight coupling to concrete classes, tree-shaped models, runtime algorithm selection, undo/redo, snapshot-and-restore, state-dependent behavior, subscriber notification, or hiding subsystem complexity. Each pattern entry includes intent, problem, solution, applicability (when to use AND when NOT to use), a runnable TypeScript example, implementation steps, pros/cons, and relations to sibling patterns. Trigger even when no pattern is named — cues like "class getting unwieldy," "giant switch," "swap implementations at runtime," "combinatorial subclasses," "need undo," or "traverse a tree" are pattern-shaped. Covers all 5 Creational, 7 Structural, and 10 Behavioral GoF patterns.

2026-05-27163

implementation-functional-patterns

pproenca/dot-skills

TypeScript's functional answers to the 22 Gang of Four classes — factory functions (Factory Method, Abstract Factory, Prototype, Memento), module-scope singletons, fluent immutable builders, wrapper functions (Adapter, Facade), native Proxy, WeakMap caches (Flyweight), discriminated unions with exhaustive match (State, Visitor, Composite), event emitters and signals (Mediator, Observer), pipelines and composition (CoR, Decorator), stream methods (Iterator), closures-as-commands, higher-order strategies, lambda placement. Use when reviewing TypeScript that has a class-shaped problem the GoF catalog solves with a hierarchy but where idiomatic TS reaches for a function, a tagged union, or a data structure. Each rule names the GoF pattern(s) it replaces and when the class form still wins. Trigger on "factory class", "singleton getInstance", "state machine class", "observer pattern", "AST visitor", "where do I put this lambda". Sibling to implementation-design-patterns.

2026-05-27163

name

cli-review-runner

description

cli-review-runner

Automates the 10-item agent-friendliness audit from cli-for-agents. Runs black-box probes against a target CLI and emits a structured report mapping each finding to a rule ID (e.g., help-examples-in-help, err-non-zero-exit-codes, safe-dry-run-flag). Default mode is read-only - probes never run destructive verbs with real arguments.

When to Apply

User asks to review or audit a CLI for agent-friendliness, automation readiness, or CI use
User has just finished building a CLI and wants a pre-ship sanity check
User is grading their own or a third-party CLI against the cli-for-agents catalog
User is asking why a CLI is hanging an agent, blowing up context, or failing to compose in a pipeline
PR review for a CLI change - quickly regress-test the --help, errors, and dry-run flags

How to Use

The skill is orchestrated by scripts/review.sh. Point it at the target CLI (absolute path or PATH-resolvable name) and pick an output format.

# Default: text table on stdout, exit 0 if all passed, 1 if any failed
bash scripts/review.sh --target /usr/local/bin/mycli

# Machine-readable output
bash scripts/review.sh --target gh --format json
bash scripts/review.sh --target kubectl --format ndjson

# Supply subcommand list when auto-discovery misses them
bash scripts/review.sh --target gh --subcommands pr,issue,repo

# Preview what would run without touching the target CLI
bash scripts/review.sh --target mycli --dry-run

# Include risky probes on destructive verbs (off by default)
bash scripts/review.sh --target mycli --include-destructive

See bash scripts/review.sh --help for the full flag list.

Workflow Overview

--target <cli>
   │
   ▼
[1] Validate target       fail fast if path missing or not executable
   │
   ▼
[2] Load rule catalog     references/rule-catalog.tsv (45 rules)
   │
   ▼
[3] Discover subcommands  parse top-level --help (gh/kubectl/commander shapes)
   │
   ▼
[4] Run probes P1..P10    each probe emits NDJSON findings to a temp file
   │
   ▼
[5] Render report         scripts/render.sh  ->  text | json | ndjson

Read references/workflow.md when you need the full probe-by-probe breakdown, failure modes, and how to extend the catalog.

Probe Coverage

Probe	Rules tested	Coverage
P1 Non-interactive	`interact-no-hang-on-stdin`, `interact-no-input-flag`, `interact-flags-first`, `interact-detect-tty`, `interact-no-timed-prompts`, `interact-no-arrow-menus`, `input-no-prompt-fallback`	Run under `</dev/null` with timeout; inspect help for interactive language
P2 Layered help	`help-per-subcommand`, `help-no-flag-required`, `help-layered-discovery`	Top-level line count; per-subcommand `--help`; zero-arg invocation
P3 Help examples	`help-examples-in-help`, `help-flag-summary`, `help-suggest-next-steps`	Grep each subcommand help for `Examples:`, short+long flag pairs, "See also"
P4 Actionable errors	`err-actionable-fix`, `err-include-example-invocation`, `err-exit-fast-on-missing-required`, `err-no-stack-traces-by-default`	Invoke with bogus flag; grep stderr for fix + example; check for raw stack traces
P5 stderr channeling	`err-stderr-not-stdout`	Error text must land on fd 2
P6 Exit codes	`err-non-zero-exit-codes`	Usage error and runtime error must produce distinct non-zero codes
P7 stdin composition	`input-accept-stdin-dash`, `input-flags-over-positional`	Grep help for `-` stdin convention; count positionals vs flags
P8 Structured output	`output-json-flag`, `output-respect-no-color`	`--json` produces JSON; `NO_COLOR=1` suppresses ANSI
P9 Destructive safety	`safe-dry-run-flag`, `safe-force-bypass-flag`, `safe-no-prompts-with-no-input`	Inspect destructive verbs' `--help` for `--dry-run` / `--yes` / `--force` / `--no-input`
P10 Command structure	`struct-resource-verb`, `struct-standard-flag-names`, `struct-no-hidden-subcommand-catchall`, `struct-flag-order-independent`	Uniform shape, `--help`/`--version` present, unknown subcommand errors, flag position independence

Coverage: 30 of the 45 rules in cli-for-agents are black-box testable. The remaining 15 (idempotency, state reconciliation, NDJSON streaming, bounded output, crash-only recovery, env-var fallback, secret-stdin, confirm-by-typing-name) require either real invocation or source-code inspection - the report lists them as "manual review required".

Configuration

config.json stores the verb classifier lists and default timeout. The skill works without any setup - defaults are reasonable. Override per-invocation via flags or edit the file for project-wide changes.

{
  "timeout_seconds": 5,
  "safe_verbs": ["list", "get", "show", "status", "describe", "help", "version", "config", "ls", "inspect"],
  "destructive_verbs": ["delete", "drop", "destroy", "remove", "reset", "purge", "rm", "del"]
}

Safety Model

All probes are read-only by default:

Safe verbs (list, get, show, ...) may be invoked with bogus flags to test error handling
Destructive verbs (delete, drop, ...) are ONLY inspected via --help - never executed with arguments
Every probe runs with a 5-second wall-clock timeout under </dev/null
The target CLI is sandboxed to its own process; no shell metacharacters in arguments

When --include-destructive is passed, probes may invoke destructive verbs with bogus flags too. This exposes the rare case where a CLI does something destructive before validating flags. Only enable this against CLIs you trust, or in a disposable test environment.

Self-test

Before shipping changes to probes, run the self-test - it generates a mock CLI that deliberately violates specific rules and asserts the probes detect them:

bash scripts/selftest.sh

Expected output: Results: 8 passed, 0 failed. Any failure points at a regression in scripts/lib/probes.sh.

Files

File	Purpose
scripts/review.sh	Main entry point - orchestrates probes, renders the report
scripts/render.sh	Output formatter: text / json / ndjson
scripts/selftest.sh	Sanity check against a deliberately-buggy mock CLI
scripts/lib/common.sh	Shared helpers: timeout, verb classifier, JSON escape, catalog loader
scripts/lib/probes.sh	Probe functions `probe_p1..probe_p10`
references/rule-catalog.tsv	45 rules from cli-for-agents, mapped to probes
references/workflow.md	Detailed probe-by-probe methodology, failure modes, extension guide
gotchas.md	Known edge cases discovered during use
config.json	Verb classifier lists and default timeout

Related Skills

cli-for-agents - the 45-rule design catalog this skill audits against. Read rule files there when the report flags an issue and you need the full explanation.