| name | test-docs |
| description | Test ToolHive documentation by executing the steps in tutorials and guides against a real environment, verifying that examples are correct and ToolHive has not regressed. Use when the user asks to test, validate, or verify documentation such as: "test the vault integration tutorial", "verify the K8s quickstart", "check the CLI install docs", "test toolhive vault integration in kubernetes", or any request to run through a doc's instructions to confirm they work. Supports both Kubernetes-based docs (tutorials, K8s guides) and CLI-based docs.
|
Test Docs
Test ToolHive documentation by running each step against a live environment, reporting pass/fail with evidence, and recommending fixes for failures.
Workflow
- Find the documentation - launch an Explore agent to locate the relevant doc file
- Read and parse - extract testable steps (bash code blocks, YAML manifests, expected outputs)
- Check prerequisites - run
scripts/check-prereqs.sh; confirm the thv version with the user
- Prepare the environment - create a dedicated test namespace unless the doc specifies otherwise
- Execute steps - run each step sequentially, capture output
- Report results - pass/fail per step with evidence
- Clean up - delete all resources created during testing
- Recommend fixes - classify failures as doc issues or ToolHive bugs
Step 1: Find the documentation
If the user provides a file path to the doc, use it directly. Assume you are running in the repository that contains the file.
Otherwise, use the Task tool with subagent_type=Explore to search the docs directory for the page matching the user's request. Documentation lives under:
docs/toolhive/tutorials/ - step-by-step tutorials
docs/toolhive/guides-cli/ - CLI how-to guides
docs/toolhive/guides-k8s/ - Kubernetes how-to guides
docs/toolhive/guides-mcp/ - MCP server guides
docs/toolhive/guides-vmcp/ - Virtual MCP Server guides
docs/toolhive/guides-registry/ - registry guides
All doc files use the .mdx extension. Search by keywords from the user's request. If multiple docs match, ask the user which one to test.
Step 2: Read and parse the doc
Read the full doc file. Extract an ordered list of testable steps:
- Bash code blocks (
\``bash`) - commands to execute
- YAML code blocks with
title="*.yaml" - manifests to apply with kubectl apply
- Expected outputs - text code blocks (
\``text`) that follow a command, or prose describing expected behavior ("You should see...")
Build a test plan: a numbered list of steps, each with:
- The command(s) to run
- What constitutes a pass (expected output, exit code 0, resource created)
- Any dependencies on previous steps (variables, resources)
Present the test plan to the user for approval before executing.
Handling placeholders
Docs often contain placeholder values like ghp_your_github_token_here or your-org. Before executing, scan for obvious placeholders (ALL_CAPS patterns, "your-*", "example-*", "replace-*") and ask the user for real values. If a step requires a secret that the user cannot provide, mark it as skipped with a note explaining why.
Step 3: Check prerequisites
Run the prerequisites checker:
bash <skill-path>/scripts/check-prereqs.sh
Where <skill-path> is the absolute path to this skill's directory.
If the script reports errors, inform the user and stop. For K8s docs, the script checks for a kind cluster named "toolhive". For CLI docs, confirm with the user that the installed thv version matches what the doc expects.
Step 4: Prepare the environment
For Kubernetes docs, ask the user about existing infrastructure before deploying anything:
- Operator CRDs and operator: if the doc includes steps to install CRDs or the ToolHive operator, ask the user whether these are already installed. If so, skip those installation steps and proceed to the doc-specific content. Use
AskUserQuestion with options like "Already installed", "Install fresh", "Reinstall (upgrade)".
- Namespaces: create a dedicated namespace
test-docs-<timestamp> (e.g., test-docs-1706886400) unless the doc explicitly uses a specific namespace (like toolhive-system). If the doc requires toolhive-system, use it but track all resources created for cleanup.
For CLI docs:
- Use a temporary directory for any files created
- Note the state of running MCP servers before testing (
thv list)
- Use unique server names: when running
thv run <server>, always use --name <server>-test to avoid conflicts with existing servers. The default name is derived from the registry entry, which may already be running. Example: thv run --name osv-test osv
- Verify registry configuration: if a registry server isn't found, confirm with the user that they're using the default registry (not a custom registry configuration)
Step 5: Execute steps
Run each step sequentially. For each step:
- Print the step number and a short description
- Run the command(s)
- Capture stdout, stderr, and exit code
- Compare against expected output or success criteria
- Record pass/fail with evidence
Execution rules
- Wait for readiness: when a step deploys resources, wait for them (e.g.,
kubectl wait --for=condition=ready) before proceeding. If the doc includes a wait command, use it. Otherwise add a reasonable wait (up to 120s for pods).
- Variable propagation: some steps set shell variables used by later steps (e.g.,
VAULT_POD=$(kubectl get pods ...)). Maintain these across steps by running in the same shell session.
- Skip non-testable steps: informational code blocks (showing file contents, expected output) are not commands to run. Use them as verification targets instead.
- Timeout: if any command hangs for more than 5 minutes, kill it and mark the step as failed with a timeout note.
Known pitfalls
- Server name conflicts:
thv run <server> uses the registry entry name by default, which may already exist. Always use thv run --name <server>-test <server> to create a uniquely-named instance. This avoids "workload with name already exists" errors.
- Port-forward needs time: when testing via
kubectl port-forward, wait at least 5 seconds before issuing curl. Start the port-forward in the background, sleep 5, then curl. Kill the port-forward PID after the check.
- MCPServer status field: the MCPServer CRD uses
.status.phase (not .status.state) for the running state. When polling for readiness, use: kubectl get mcpservers <name> -n <ns> -o jsonpath='{.status.phase}' or simply poll kubectl get mcpservers and check the STATUS column.
- MCPServer readiness polling: after
kubectl apply for an MCPServer, it may take 30-60 seconds to reach Running. Poll with 5-second intervals up to 120 seconds before declaring a timeout.
Step 6: Report results
After all steps complete, produce a summary table:
## Test Results: <doc title>
| Step | Description | Result | Notes |
|------|----------------------|--------|--------------------------|
| 1 | Install Vault | PASS | |
| 2 | Configure auth | PASS | |
| 3 | Store secrets | PASS | |
| 4 | Deploy MCPServer | FAIL | Pod CrashLoopBackOff |
| 5 | Verify integration | SKIP | Depends on step 4 |
For each FAIL, include:
- The command that failed
- Actual output vs expected output
- A classification:
- Doc issue: the command or example in the doc is wrong or outdated (e.g., wrong image tag, deprecated flag, missing step)
- ToolHive bug: the doc appears correct but ToolHive behaves unexpectedly (e.g., crash, wrong output from a correct command)
- Environment issue: missing prerequisite, network problem, or transient error
- A specific recommendation (e.g., "Update image tag from
v0.7.0 to v0.8.2" or "File a bug: MCPServer pods fail to start when Vault annotations are present")
Step 7: Clean up
Always clean up after testing:
- Delete test namespaces:
kubectl delete namespace <test-namespace>
- Remove Helm releases installed during testing
- Delete any temporary files or directories
- If resources were created in
toolhive-system, delete them individually by name
If cleanup fails, report what could not be cleaned up so the user can handle it manually.
Example session
User: "test the vault integration tutorial"
- Explore agent finds
docs/toolhive/tutorials/vault-integration.mdx
- Parse the doc: 5 major steps with 12 bash commands and 1 YAML manifest
- Run
check-prereqs.sh - kind cluster "toolhive" exists, thv v0.8.2
- Ask user: "The doc uses a placeholder GitHub token
ghp_your_github_token_here. Provide a real token or skip step 2?"
- Present test plan, user approves
- Execute steps 1-5, report results
- Clean up: delete vault namespace, remove MCPServer resource
- Summary: 4/5 PASS, 1 FAIL with recommendation