| name | validate-toolkits |
| description | Validate toolkit components and project docs — check external doc URLs, cross-references between skills/commands/rules, and verify README.md and CLAUDE.md are in sync with actual toolkit state. Use when the user asks to validate, review, or check toolkit quality. |
| argument-hint | <toolkit-path> |
Validate toolkit components
Deep-validate a toolkit's markdown files: external references, cross-references, and component accuracy.
Parse $ARGUMENTS:
toolkit-path (optional): path to toolkit directory (e.g. workbench/rest-api-pipeline). If omitted, validate all toolkits under workbench/.
1. Build component map
Run the extraction tool to get the component map and file inventory:
uv run python tools/extract_refs.py [toolkit-path]
This outputs JSON with:
- components: lists of skills, commands, rules in the toolkit (for cross-reference resolution)
- files: each .md file with its URLs and surrounding context lines
Save the components map. You will process files one at a time in the next steps.
2. Process each file
For each file in the extraction output, do the following before moving to the next file:
a. Validate URLs
For each URL in the file (use the context lines from the extraction):
- Fetch it using WebFetch. Check if the page loads.
- Validate relevance using the context: does the page content match what the surrounding text claims? For example, if context says "Essential Reading on resource config" and the URL points to a page about pipelines — that's a mismatch.
- If a URL is dead (404) or redirects to unrelated content, mark as ERROR.
- If a URL loads but content doesn't match the context, mark as WARNING.
- Use web search to find the correct URL if a reference is broken but the intent is clear from context.
b. Validate cross-references
Read the file and look for references to other components in the same toolkit. References are NOT limited to backtick formatting — look for natural language patterns like:
- "use the debug-pipeline skill"
- "see
validate-data"
- "continue with create-pipeline"
- "check rules in workflow.md"
- "step 6b in create-pipeline"
For each cross-reference found:
- Resolve against the component map. Does the referenced skill/command/rule exist?
- If it resolves, check context: does the reference make sense? (e.g., "use debug-pipeline after a failed run" — does debug-pipeline actually handle post-failure inspection?)
- If it doesn't resolve, try fuzzy matching:
- Partial name match (e.g., "debug" could mean "debug-pipeline")
- Similar names (e.g., "explore-data" might be the old name for "view-data")
- If a match is found, fix the reference in the source file.
- If no match can be found, mark as ERROR.
c. Fix the file
Apply any fixes (broken URLs, renamed references) immediately before moving to the next file.
3. Validate project docs against current state
Check README.md, CLAUDE.md, and EVALS.md in the repo root for content that references toolkit state. Compare against the actual component map from step 1.
What to check
- Toolkit listings — are all toolkits mentioned? Are descriptions accurate? Are any listed toolkits missing or renamed?
- Skill/command tables — do skill names, step numbers, and descriptions match the actual SKILL.md frontmatter and workflow?
- CLI examples — do
dlthub ai commands shown in the docs match the current CLI interface? Run dlthub ai --help and dlthub ai toolkit --help to verify.
- Architecture diagrams — do mermaid diagrams reflect the current toolkit set and their relationships?
- Marketplace references — does the marketplace.json content (names, descriptions, tags) match what README/CLAUDE.md say?
- EVALS.md — do the documented tools, directory structure, config format, and CLI examples match the actual scripts in
tools/ and skill files in .claude/skills/? Check that create_eval_workspace.py, run_trigger_eval.py, list_skill_descriptions.py usage examples are accurate.
How to fix
- If a toolkit was added, renamed, or removed: update the relevant tables/lists in README.md and CLAUDE.md.
- If skills within a toolkit changed (added, removed, renamed): update any skill tables or workflow descriptions.
- If CLI interface changed: update command examples.
- Mark changes that can't be auto-resolved as WARNINGS (e.g., prose descriptions that may need human judgment).
4. Report
After all files are processed, output a summary:
Validated: <toolkit-name>
Files scanned: N
URLs checked: N (M broken)
Cross-references checked: N (M broken)
Project docs checked: README.md, CLAUDE.md, EVALS.md
FIXED:
<file>: <old-ref> → <new-ref>
ERRORS:
<file>: <description of unresolvable reference>
WARNINGS:
<file>: <description of questionable content>
PROJECT DOCS:
<file>: <what was updated or what is out of sync>
Errors must be resolved by the user.