// Optional AI-powered quality assessment for specifications, plans, and tests. Goes beyond rule-based validation to evaluate clarity, testability, edge cases, and architecture soundness. Activates for validate quality, quality check, assess spec, evaluate increment, spec review, quality score.
| name | increment-quality-judge |
| description | Optional AI-powered quality assessment for specifications, plans, and tests. Goes beyond rule-based validation to evaluate clarity, testability, edge cases, and architecture soundness. Activates for validate quality, quality check, assess spec, evaluate increment, spec review, quality score. |
| allowed-tools | Read, Grep, Glob |
AI-powered quality assessment that evaluates specifications, plans, and tests beyond rule-based validation using LLM-as-Judge pattern.
Provide nuanced quality assessment of SpecWeave artifacts (spec.md, plan.md, tests.md) to catch subtle issues that rule-based validation might miss.
Activates when user mentions:
Triggered by:
/specweave:validate 0001 --quality (supports --export, --fix, --always flags)Evaluates specifications across 6 dimensions:
dimensions:
clarity:
weight: 0.20
criteria:
- "Is the problem statement clear?"
- "Are objectives well-defined?"
- "Is terminology consistent?"
testability:
weight: 0.25
criteria:
- "Are acceptance criteria testable?"
- "Can success be measured objectively?"
- "Are edge cases identifiable?"
completeness:
weight: 0.20
criteria:
- "Are all requirements addressed?"
- "Is error handling specified?"
- "Are non-functional requirements included?"
feasibility:
weight: 0.15
criteria:
- "Is the architecture scalable?"
- "Are technical constraints realistic?"
- "Is timeline achievable?"
maintainability:
weight: 0.10
criteria:
- "Is design modular?"
- "Are extension points identified?"
- "Is technical debt addressed?"
edge_cases:
weight: 0.10
criteria:
- "Are failure scenarios covered?"
- "Are performance limits specified?"
- "Are security considerations included?"
interface QualityScore {
overall: number; // 0.00-1.00 (displayed as 0-100)
dimensions: {
clarity: number;
testability: number;
completeness: number;
feasibility: number;
maintainability: number;
edge_cases: number;
};
issues: Issue[];
suggestions: Suggestion[];
confidence: number; // 0.00-1.00 (how confident the assessment is)
}
interface Issue {
dimension: string;
severity: 'critical' | 'major' | 'minor';
description: string;
location: string; // Section reference
impact: string; // What could go wrong
}
interface Suggestion {
dimension: string;
priority: 'high' | 'medium' | 'low';
description: string;
example: string; // How to improve
}
/specweave:validate 001
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
VALIDATION RESULTS: Increment 0001-authentication
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
Rule-Based Validation: PASSED (120/120 checks)
โ Consistency (47/47)
โ Completeness (23/23)
โ Quality (31/31)
โ Traceability (19/19)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ค Run AI Quality Assessment? (Optional)
This will:
โข Evaluate spec clarity, testability, edge cases
โข Provide detailed improvement suggestions
โข Use ~2,000 tokens (1-2 minutes)
Your choice:
[Y] Yes, assess quality
[N] No, skip (default)
[A] Always run (save to config)
Choice: _
If user selects Yes:
๐ Analyzing increment quality...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
AI QUALITY ASSESSMENT
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Overall Score: 87/100 (GOOD) โ
Dimension Scores:
Clarity: 92/100 โโ
Testability: 78/100 โ (Needs improvement)
Completeness: 90/100 โโ
Feasibility: 88/100 โโ
Maintainability: 85/100 โ
Edge Cases: 72/100 โ ๏ธ (Action needed)
Confidence: 92%
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ISSUES FOUND (2)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๏ธ MAJOR: Acceptance criteria not fully testable
Dimension: Testability
Location: spec.md, section "Success Criteria"
Issue: "User can log in successfully" is vague
Impact: QA won't know when feature is complete
๐ธ MINOR: Rate limiting edge case not addressed
Dimension: Edge Cases
Location: spec.md, section "Security"
Issue: No mention of brute-force protection
Impact: Security vulnerability risk
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
SUGGESTIONS (3)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฏ HIGH PRIORITY: Make acceptance criteria measurable
Current:
"User can log in successfully"
Improved:
"User can log in with valid credentials within 2 seconds,
receiving a JWT token with 24h expiry"
๐ฏ HIGH PRIORITY: Specify edge case handling
Add section: "Error Scenarios"
- Rate limiting: 5 failed attempts โ 15 min lockout
- Invalid token: Return 401 with error code
- Expired session: Redirect to login with message
๐น MEDIUM PRIORITY: Add performance requirements
Suggested addition:
- Login latency: p95 < 500ms
- Concurrent logins: Support 100/sec
- Token validation: < 10ms
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
SUMMARY
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
This specification is GOOD (87/100)
Strengths:
โข Clear problem statement and objectives
โข Architecture is sound and scalable
โข Good coverage of functional requirements
Areas for improvement:
โข Make acceptance criteria more testable (2 items)
โข Address edge cases (rate limiting, errors)
โข Add performance requirements
Recommendation: Address HIGH priority suggestions before
implementation. MEDIUM priority can be added incrementally.
Would you like to:
[F] Auto-fix HIGH priority issues (experimental)
[E] Export suggestions to tasks.md
[C] Continue without changes
Choice: _
User: "Validate increment 001"
increment-quality-judge:
โ
Rule-based: 120/120 passed
๐ค Run AI quality check? [Y/n]: Y
๐ Analyzing...
Overall: 87/100 (GOOD)
Issues: 2 major, 1 minor
Suggestions: 3 high priority
[Full report shown above]
User: "Validate increment 001 --quality"
increment-quality-judge:
โ
Rule-based: 120/120 passed
๐ AI Quality: 87/100 (GOOD)
[Full report shown automatically]
User: "Validate increment 001"
[After quality assessment shows suggestions]
User: E (Export to tasks.md)
increment-quality-judge:
โ
Exported 3 suggestions to .specweave/increments/0001-auth/tasks.md
Added tasks:
- [ ] Make acceptance criteria measurable (HIGH)
- [ ] Specify edge case handling (HIGH)
- [ ] Add performance requirements (MEDIUM)
For each dimension, evaluate using prompt:
You are evaluating a software specification for {dimension}.
Specification:
{spec_content}
Criteria:
{dimension_criteria}
Score 0.00-1.00 based on:
- 1.00: Exceptional, industry best practice
- 0.80-0.99: Good, minor improvements possible
- 0.60-0.79: Acceptable, notable gaps
- 0.00-0.59: Needs work, significant issues
Provide:
1. Score (0.00-1.00)
2. Issues found (if any)
3. Suggestions for improvement
4. Confidence (0.00-1.00)
Format:
{
"score": 0.85,
"issues": [...],
"suggestions": [...],
"confidence": 0.90
}
overall_score =
(clarity * 0.20) +
(testability * 0.25) +
(completeness * 0.20) +
(feasibility * 0.15) +
(maintainability * 0.10) +
(edge_cases * 0.10)
confidence = average(dimension_confidences)
if (confidence < 0.80) {
show_warning = "โ ๏ธ Low confidence. Results may be less accurate."
}
User: /specweave:validate-increment 001
โ
Run 120 rule-based checks
โ
Show results: โ
PASSED or โ FAILED
User: /specweave:validate-increment 001
โ
Run 120 rule-based checks
โ
Show results: โ
PASSED
โ
Prompt: "Run AI quality check? [Y/n]"
โ (if Yes)
Run quality assessment (~2k tokens)
โ
Show detailed quality report with scores
โ
Prompt: "Export suggestions to tasks? [Y/n]"
User: /specweave:validate-increment 001
โ
Run in parallel:
โข Rule-based checks (120 rules)
โข AI quality assessment
โ
Show combined results:
โ
Rule-based: 120/120
๐ Quality: 87/100
โ
Show detailed report
Estimated per increment:
Cost (Claude Sonnet 4.5):
Optimization:
Given: Valid spec.md with minor issues When: quality-judge runs Then: Returns score 70-85 with specific suggestions
Given: High-quality spec with clear criteria When: quality-judge runs Then: Returns score 90+ with minimal suggestions
Given: Vague spec with unclear requirements When: quality-judge runs Then: Returns score <60 with critical issues
Given: Quality assessment with 3 suggestions When: User selects "Export to tasks" Then: 3 tasks added to tasks.md with priority labels
Given: User provides --quality flag When: User requests validation ("validate increment 001 --quality") Then: Quality check runs automatically without prompt
Always run rule-based checks first:
Then optionally run quality judge for nuanced feedback.
Quality judge provides suggestions, but:
Healthcare/Finance may need 90+ scores. Internal tools may accept 70+ scores.
If confidence < 80%, consider:
What quality-judge CAN'T do:
What quality-judge CAN do:
increment-quality-judge extends SpecWeave's validation with AI-powered quality assessment:
โ Multi-dimensional scoring (6 dimensions, weighted) โ Interactive workflow (prompt user, optional) โ Actionable suggestions (with examples) โ Export to tasks (convert suggestions to work items) โ Configurable (thresholds, dimensions, auto-run) โ Token-efficient (~2k tokens per assessment) โ Complements rule-based validation (catches different issues)
Use it when: You want high-quality specs that go beyond structural correctness to ensure clarity, testability, and completeness.
Skip it when: Quick iteration, tight token budget, or simple features where rule-based validation is sufficient.