원클릭으로 Manus에서 모든 스킬 실행

blackbox-test

스타1

포크0

업데이트2026년 3월 10일 08:58

Run a blackbox test of the project's CLI or executable. Use this skill whenever the user wants to "blackbox test", "test the CLI blindly", "test as an outsider", "test without reading source", "QA test the tool", "smoke test the CLI", "exploratory testing", or says anything like "pretend you don't know the code and test it". Also triggers when the user mentions "blackbox", "black box", "blind test", or "external test" in the context of testing their project. This skill works with any tech stack — Python, Node, Rust, Go, or anything else that produces a CLI.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

ThomasRohde

ThomasRohde/marketplace

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Blackbox Test

Launch an isolated subagent that tests the project's CLI with zero knowledge of the source code. The agent discovers capabilities solely through the CLI itself, tries to break it, and produces a structured feedback report.

Why this matters

Reading source code biases testing — you test what you know it does rather than what a real user would encounter. A blackbox test catches discoverability problems, misleading help text, confusing error messages, silent failures, and edge cases that unit tests miss because they reflect the author's assumptions.

How it works

You create a temporary ./blackbox/ directory as an isolation boundary
You spawn a subagent inside it with strict instructions: no source code access, only CLI interaction
The subagent explores, documents, and stress-tests the CLI
It writes FEEDBACK.md inside the blackbox directory
You copy the feedback to the repo root with a timestamped slug name
You delete the blackbox directory

Step 1: Determine the command

Figure out how to invoke the project's CLI. Check, in order:

Ask the user if they specified a command
Look at pyproject.toml for [project.scripts] entries
Look at package.json for bin or scripts.start
Look at Cargo.toml for [[bin]]
Look for a Makefile with a run target
Look for an obvious entry point (main.go, cli.py, index.js)

Determine the full invocation command, including any virtual environment activation or path prefix needed. For example:

Python: /path/to/.venv/Scripts/python -m mypackage or .venv/bin/python -m mypackage
Node: npx mypackage or node dist/cli.js
Rust: cargo run -- or ./target/release/mybin
Go: go run . or ./mybin

Step 2: Create the isolation directory

mkdir -p ./blackbox

Step 3: Spawn the subagent

Use the Agent tool with mode: "bypassPermissions" so the tester can run commands freely. The prompt below is a template — fill in {{COMMAND}} and {{WORKING_DIR}}.

You are a QA tester performing a blackbox test of a CLI tool. You have ZERO prior
knowledge about this tool. You cannot read any source code.

The command to run the tool is: {{COMMAND}}
Your working directory is: {{WORKING_DIR}}/blackbox

## Testing protocol

### Phase 1: Discovery
- Run the tool with no arguments
- Run with --help / -h
- Run with --version / -v
- Explore every subcommand's --help
- Note the tool's apparent purpose, all commands, all flags

### Phase 2: Happy path
- For each subcommand, construct a valid invocation using the help text
- Create any needed input files (markdown, JSON, YAML, CSV, etc.) in your
  working directory
- Verify outputs match what the help text promises

### Phase 3: Error handling
- Missing required arguments
- Invalid flag values and types
- Non-existent file paths
- Permission-denied scenarios (read-only files)
- Empty files, binary files, files with only whitespace
- Extremely long inputs (generate a large file, 1000+ entries)
- Special characters in file names and content (spaces, unicode, quotes)
- CRLF vs LF line endings
- Deeply nested or recursive structures

### Phase 4: Edge cases
- Flags in unusual positions (before/after subcommands)
- Duplicate flags
- Mutually exclusive options used together
- Pipe input (echo "..." | {{COMMAND}})
- Output to stdout vs file vs non-existent directory
- Concurrent invocations if applicable
- Ctrl+C interruption behavior (if testable)

### Phase 5: Consistency
- Are exit codes consistent and meaningful?
- Are error messages helpful and actionable?
- Is output format consistent across commands?
- Does --help match actual behavior?

## Rules
- Do NOT read source code files (.py, .js, .ts, .rs, .go, .java, etc.)
- Do NOT read build configs (pyproject.toml, package.json, Cargo.toml, etc.)
- Do NOT look at test files
- Only interact via the CLI
- Create all test fixtures inside your working directory

## Deliverable
Create {{WORKING_DIR}}/blackbox/FEEDBACK.md with these sections:

# Blackbox Test Report

## Tool Summary
What does this tool appear to do? Based solely on CLI exploration.

## Bugs Found
Numbered list. Each bug has:
- **Title**: one-line summary
- **Severity**: critical / high / medium / low
- **Reproduction**: exact commands to reproduce
- **Expected**: what should happen
- **Actual**: what actually happens

## UX Issues
Numbered list of confusing, inconsistent, or poorly documented behaviors.

## What Worked Well
Things that impressed you or worked correctly under stress.

## Recommendations
Prioritized suggestions for improvement.

## Test Log
Summary table of every command tried and its outcome category
(pass / fail / unexpected / crash).

Step 4: Collect and archive results

After the subagent finishes:

Copy the feedback file to the repo root with a slug name:
```
cp ./blackbox/FEEDBACK.md ./FEEDBACK-blackbox-$(date +%Y-%m-%d).md
```
If a file with that name already exists, append a counter: FEEDBACK-blackbox-2025-01-15-2.md
Remove the blackbox directory:
```
rm -rf ./blackbox
```
Summarize the results to the user — total bugs found by severity, top UX issues, and anything that worked well. Keep it concise.

Customization

The user can modify the test by saying things like:

"Focus on the deploy subcommand" — narrow Phase 2-4 to that command
"Test with JSON output" — add --format json or equivalent to all commands
"Include performance testing" — add timing measurements to the protocol
"Skip edge cases" — drop Phase 4

Adapt the subagent prompt accordingly.

이 저장소의 다른 Skills

같은 저장소

vendor-positioning-report

ThomasRohde/marketplace

Create or update independent vendor positioning matrix reports. Use when asked for magic quadrant style vendor comparisons, analyst-style market matrices, provider landscapes, technology shortlists, two-axis vendor charts, sourced scoring models, or PDF, Microsoft Word/DOCX, and Markdown vendor evaluation packages. Produces original, evidence-led reports with source ledgers, transparent scoring, bank/regulatory due diligence, and safeguards against Gartner or other analyst-firm imitation.

2026-04-301

cpf-workflow-author

ThomasRohde/marketplace

Create and edit checkpointflow workflow YAML files. Use this skill whenever the user wants to create a workflow, automate a process as a YAML pipeline, define steps that mix CLI commands with human or agent checkpoints, turn a conversation into a reusable runbook, or asks about writing cpf/checkpointflow workflows. Also use it when you see keywords like "workflow", "runbook", "approval flow", "pause and resume", or "await event" in the context of automation.

2026-03-301

cpf-skill-creator

ThomasRohde/marketplace

Create and improve Claude Code skills using a structured checkpointflow workflow with test-driven iteration. Use this skill whenever the user wants to create a skill, make a skill, build a skill, improve an existing skill, test a skill with evals, benchmark skill quality, optimize a skill description, or turn a conversation into a reusable skill. Also triggers on "skill for X", "automate this as a skill", "package this as a skill", and similar phrases.

2026-03-301

earos-rubric

ThomasRohde/marketplace

Create new architecture evaluation rubrics (profiles and overlays) based on the Enterprise Architecture Rubric Operational Standard (EAROS). Use this skill whenever the user wants to "create a rubric", "add a rubric profile", "write an architecture evaluation rubric", "define scoring criteria for architecture artifacts", "create an EAROS profile", "add an overlay", "create a security overlay", "create a data overlay", "evaluate architecture artifacts", "set up architecture review criteria", "build a rubric for solution architecture", "create an ADR rubric", "create a capability map rubric", "define architecture quality criteria", or mentions "EAROS", "rubric", "architecture evaluation", "scoring profile", or "architecture review criteria" in the context of creating or extending evaluation rubrics. Also triggers when the user says "help me evaluate architecture documents", "define review criteria for our artifacts", "standardize architecture review", "create a review checklist", or any request to systematicall

2026-03-171

apply-rubric

ThomasRohde/marketplace

Apply an EAROS rubric to an architecture artifact using the three-pass agent evaluation pattern (Extractor, Evaluator, Challenger). Use this skill whenever the user wants to "evaluate an architecture artifact", "apply a rubric", "review an architecture document", "score an architecture artifact", "run an EAROS evaluation", "assess architecture quality", "apply the solution architecture rubric", "evaluate this ADR", "review this capability map", "check this against the rubric", "run the architecture review", or mentions "evaluate", "score", "assess", "review", or "apply rubric" in the context of applying an EAROS rubric to a specific artifact. Also triggers when the user says "how does this artifact score", "is this architecture document good enough", "run the three-pass evaluation", "extract evidence from this document", or any request to systematically evaluate a specific architecture work product against defined criteria. Does NOT trigger for creating rubrics (use earos-rubric for that), general architectur

2026-03-171

autoresearch

ThomasRohde/marketplace

Autonomous agent-driven optimization loop inspired by Karpathy's autoresearch. Sets up and runs an iterative hill-climbing harness where subagents modify an artifact, evaluate against a single scalar metric, and keep improvements. Use this skill whenever the user wants to "optimize something iteratively", "run an autoresearch loop", "hill-climb on performance", "auto-optimize", "iterate and improve automatically", "run experiments autonomously", "autonomous optimization", or mentions "autoresearch" in any context. Also triggers when the user describes a workflow like "try variations and measure which is best", "keep tweaking until it's faster", "optimize this config", "find the best prompt", "tune hyperparameters", "benchmark variations", or any scenario where they want an agent to autonomously explore a search space against a measurable objective. Works with any domain — code performance, prompt engineering, config tuning, SQL optimization, CSS optimization, model training, build flags, or anything with a me

2026-03-151

name

blackbox-test

description

Blackbox Test

Why this matters

How it works

You create a temporary ./blackbox/ directory as an isolation boundary
You spawn a subagent inside it with strict instructions: no source code access, only CLI interaction
The subagent explores, documents, and stress-tests the CLI
It writes FEEDBACK.md inside the blackbox directory
You copy the feedback to the repo root with a timestamped slug name
You delete the blackbox directory

Step 1: Determine the command

Figure out how to invoke the project's CLI. Check, in order:

Ask the user if they specified a command
Look at pyproject.toml for [project.scripts] entries
Look at package.json for bin or scripts.start
Look at Cargo.toml for [[bin]]
Look for a Makefile with a run target
Look for an obvious entry point (main.go, cli.py, index.js)

Determine the full invocation command, including any virtual environment activation or path prefix needed. For example:

Python: /path/to/.venv/Scripts/python -m mypackage or .venv/bin/python -m mypackage
Node: npx mypackage or node dist/cli.js
Rust: cargo run -- or ./target/release/mybin
Go: go run . or ./mybin

Step 2: Create the isolation directory

mkdir -p ./blackbox

Step 3: Spawn the subagent

Use the Agent tool with mode: "bypassPermissions" so the tester can run commands freely. The prompt below is a template — fill in {{COMMAND}} and {{WORKING_DIR}}.

You are a QA tester performing a blackbox test of a CLI tool. You have ZERO prior
knowledge about this tool. You cannot read any source code.

The command to run the tool is: {{COMMAND}}
Your working directory is: {{WORKING_DIR}}/blackbox

## Testing protocol

### Phase 1: Discovery
- Run the tool with no arguments
- Run with --help / -h
- Run with --version / -v
- Explore every subcommand's --help
- Note the tool's apparent purpose, all commands, all flags

### Phase 2: Happy path
- For each subcommand, construct a valid invocation using the help text
- Create any needed input files (markdown, JSON, YAML, CSV, etc.) in your
  working directory
- Verify outputs match what the help text promises

### Phase 3: Error handling
- Missing required arguments
- Invalid flag values and types
- Non-existent file paths
- Permission-denied scenarios (read-only files)
- Empty files, binary files, files with only whitespace
- Extremely long inputs (generate a large file, 1000+ entries)
- Special characters in file names and content (spaces, unicode, quotes)
- CRLF vs LF line endings
- Deeply nested or recursive structures

### Phase 4: Edge cases
- Flags in unusual positions (before/after subcommands)
- Duplicate flags
- Mutually exclusive options used together
- Pipe input (echo "..." | {{COMMAND}})
- Output to stdout vs file vs non-existent directory
- Concurrent invocations if applicable
- Ctrl+C interruption behavior (if testable)

### Phase 5: Consistency
- Are exit codes consistent and meaningful?
- Are error messages helpful and actionable?
- Is output format consistent across commands?
- Does --help match actual behavior?

## Rules
- Do NOT read source code files (.py, .js, .ts, .rs, .go, .java, etc.)
- Do NOT read build configs (pyproject.toml, package.json, Cargo.toml, etc.)
- Do NOT look at test files
- Only interact via the CLI
- Create all test fixtures inside your working directory

## Deliverable
Create {{WORKING_DIR}}/blackbox/FEEDBACK.md with these sections:

# Blackbox Test Report

## Tool Summary
What does this tool appear to do? Based solely on CLI exploration.

## Bugs Found
Numbered list. Each bug has:
- **Title**: one-line summary
- **Severity**: critical / high / medium / low
- **Reproduction**: exact commands to reproduce
- **Expected**: what should happen
- **Actual**: what actually happens

## UX Issues
Numbered list of confusing, inconsistent, or poorly documented behaviors.

## What Worked Well
Things that impressed you or worked correctly under stress.

## Recommendations
Prioritized suggestions for improvement.

## Test Log
Summary table of every command tried and its outcome category
(pass / fail / unexpected / crash).

Step 4: Collect and archive results

After the subagent finishes:

Copy the feedback file to the repo root with a slug name:
```
cp ./blackbox/FEEDBACK.md ./FEEDBACK-blackbox-$(date +%Y-%m-%d).md
```
If a file with that name already exists, append a counter: FEEDBACK-blackbox-2025-01-15-2.md
Remove the blackbox directory:
```
rm -rf ./blackbox
```
Summarize the results to the user — total bugs found by severity, top UX issues, and anything that worked well. Keep it concise.

Customization

The user can modify the test by saying things like:

"Focus on the deploy subcommand" — narrow Phase 2-4 to that command
"Test with JSON output" — add --format json or equivalent to all commands
"Include performance testing" — add timing measurements to the protocol
"Skip edge cases" — drop Phase 4

Adapt the subagent prompt accordingly.