// Assess and improve skill quality. Use when auditing skills, generating improvement suggestions, checking standards compliance, or analyzing token usage.
| name | skills-eval |
| description | Assess and improve skill quality. Use when auditing skills, generating improvement suggestions, checking standards compliance, or analyzing token usage. |
| version | 2.0.0 |
| category | skill-management |
| tags | ["evaluation","improvement","skills","optimization","quality-assurance","tool-use","performance-metrics"] |
| dependencies | ["modular-skills","performance-optimization"] |
| tools | ["skills-auditor","improvement-suggester","compliance-checker","tool-performance-analyzer","token-usage-tracker"] |
| provides | {"infrastructure":["evaluation-framework","quality-assurance","improvement-planning"],"patterns":["skill-analysis","token-optimization","modular-design"],"sdk_features":["agent-sdk-compatibility","advanced-metrics","dynamic-discovery"]} |
| estimated_tokens | 1800 |
| usage_patterns | ["skill-audit","quality-assessment","improvement-planning","skills-inventory","tool-performance-evaluation","dynamic-discovery-optimization","advanced-tool-use-analysis","programmatic-calling-efficiency","context-preservation-quality","token-efficiency-optimization","modular-architecture-validation","integration-testing","compliance-reporting","performance-benchmarking"] |
| complexity | advanced |
| evaluation_criteria | {"structure_compliance":25,"metadata_quality":20,"token_efficiency":25,"tool_integration":20,"claude_sdk_compliance":10} |
Analyze and improve Claude skills across ~/.claude/ locations. The tools audit skills against quality standards, measure token usage, and generate improvement recommendations.
A meta-skill for evaluating and improving existing skills. It runs quality assessments, performance analysis, and generates improvement plans.
Use this skill when you're evaluating or improving existing skills
โ Perfect for:
โ Don't use when:
Key differentiator: This skill focuses on evaluation and improvement, while modular-skills focuses on design patterns and architecture.
# Run comprehensive audit of all skills
python scripts/skills_eval/skills_auditor.py --scan-all --format markdown
# Audit specific skill
python scripts/skills_eval/skills_auditor.py --skill-path path/to/skill/SKILL.md
# Or use Makefile:
make audit-skill PATH=path/to/skill/SKILL.md
make audit-all
# Deep analysis of single skill
python scripts/skill_analyzer.py --path path/to/skill/SKILL.md --verbose
# Check token usage
python scripts/token_estimator.py --file path/to/skill/SKILL.md
# Or use Makefile:
make analyze-skill PATH=path/to/skill/SKILL.md
make estimate-tokens PATH=path/to/skill/SKILL.md
# Get prioritized improvement suggestions
python scripts/skills_eval/improvement_suggester.py --skill-path path/to/skill/SKILL.md --priority high
# Check standards compliance
python scripts/skills_eval/compliance_checker.py --skill-path path/to/skill/SKILL.md --standard all
# Or use Makefile:
make improve-skill PATH=path/to/skill/SKILL.md
make check-compliance PATH=path/to/skill/SKILL.md
make audit-all to find and audit all skillsmake audit-skill PATH=... for specific skillsmake analyze-skill PATH=... for complexity analysismake improve-skill PATH=...make check-compliance PATH=...make estimate-tokens PATH=...We use this framework in a few common situations:
skills-eval to identify optimization opportunities.# Comprehensive evaluation with scoring
./scripts/skills-auditor --scan-all --format table --priority high
# Detailed analysis of specific skill
./scripts/improvement-suggester --skill-path path/to/skill/SKILL.md --priority all --format markdown
# Token usage and efficiency
./scripts/token-usage-tracker --skill-path path/to/skill/SKILL.md --context-analysis
# Advanced tool performance metrics
./scripts/tool-performance-analyzer --skill-path path/to/skill/SKILL.md --metrics all
# Validate against Claude Skills standards
./scripts/compliance-checker --skill-path path/to/skill/SKILL.md --standard all --format summary
# Auto-fix common issues
./scripts/compliance-checker --skill-path path/to/skill/SKILL.md --auto-fix --severity high
# Generate prioritized improvement plan
./scripts/improvement-suggester --skill-path path/to/skill/SKILL.md --priority critical,high
# Benchmark performance
./scripts/token-usage-tracker --skill-path path/to/skill/SKILL.md --benchmark optimization-targets
The framework evaluates skills across multiple dimensions with weighted scoring:
Primary Categories (100 points total):
For comprehensive implementation details and advanced techniques:
modules/skill-authoring-best-practices.md for official Claude guidance on writing effective Skills (core principles, progressive disclosure, workflows, anti-patterns)modules/authoring-checklist.md for quick-reference validation checklistmodules/evaluation-workflows.md for detailed workflowsmodules/quality-metrics.md for scoring criteria and evaluation levelsmodules/advanced-tool-use-analysis.md for specialized evaluation techniquesmodules/evaluation-framework.md for detailed scoring and quality gatesmodules/integration.md for workflow integration with other skillsmodules/troubleshooting.md for common issues and solutionsmodules/pressure-testing.md for adversarial validation methodologyscripts/ directoryscripts/automation/