원클릭으로
web-severity-scoring
Compute web accessibility scores (0-100, A-F grades) with severity scoring, confidence levels, and remediation tracking across audits.
메뉴
Compute web accessibility scores (0-100, A-F grades) with severity scoring, confidence levels, and remediation tracking across audits.
SOC 직업 분류 기준
Use for web accessibility work in HTML, JSX, CSS, ARIA, keyboard, forms, contrast, modals, live regions, headings, links, tables, or WCAG review; starts accessibility-lead first and uses tool_search if subagent tools are lazy-loaded.
Developer tools accessibility router for Python, wxPython, desktop accessibility APIs, NVDA add-ons, scanner tooling, CI tooling, and accessibility developer experience.
Document accessibility router for Word, Excel, PowerPoint, PDF, EPUB, Office remediation, PDF remediation, and accessible generated documents.
GitHub workflow accessibility router for PR review, issues, Actions, releases, projects, security alerts, notifications, repository management, and accessible contributor workflows.
Markdown accessibility router for docs, README files, headings, links, tables, alt text, diagrams, generated docs, and publication-ready accessible markdown.
Audit and fix markdown for accessibility. Covers ambiguous links, anchors, emoji (remove/translate), Mermaid/ASCII templates, heading hierarchy, table descriptions, and severity scoring.
| name | web-severity-scoring |
| description | Compute web accessibility scores (0-100, A-F grades) with severity scoring, confidence levels, and remediation tracking across audits. |
Page Score = 100 - (sum of weighted findings)
Weights:
Critical (confirmed, all three sources): -18 points
Critical (high confidence, both sources): -15 points
Critical (high confidence, single source): -10 points
Critical (medium confidence): -7 points
Critical (low confidence): -3 points
Serious (high confidence): -7 points
Serious (medium confidence): -5 points
Serious (low confidence): -2 points
Moderate (high confidence): -3 points
Moderate (medium confidence): -2 points
Moderate (low confidence): -1 point
Minor: -1 point
Floor: 0 (minimum score)
Use a profile to tune strictness by context while keeping comparable grade bands:
| Profile | Intended Use | Multiplier |
|---|---|---|
| balanced (default) | Standard product delivery | 1.0 |
| strict | Regulated/public-sector releases | 1.15 |
| advisory | Early design and prototyping | 0.8 |
Apply the profile multiplier to each final deduction after confidence handling.
page_score = 100
for each finding:
base = lookup(severity, confidence_level, source_count) // from table above
multiplier = 1.2 if confidence_level == "confirmed" else 1.0
deduction = base × multiplier
page_score = max(0, page_score - deduction)
The values in the lookup table above are base deductions (pre-multiplier). "Confirmed" findings (validated by all three sources: axe-core + agent review + Playwright) apply an additional 1.2× multiplier.
Example: One Critical finding at confirmed confidence = 18 (base) × 1.2 = 21.6 points deducted → page score 78.
To reduce false-positive inflation and stabilize trends, apply a calibration coefficient by rule family:
calibrated_deduction = deduction × calibration_coefficient(rule_family)
Recommended initial coefficients:
| Rule Family | Coefficient | Rationale |
|---|---|---|
| Keyboard/focus | 1.1 | High functional impact at runtime |
| Forms/labels/errors | 1.05 | High completion risk for core tasks |
| Semantics/structure | 1.0 | Baseline scoring |
| Link text/context | 0.9 | Higher context variance |
| Content quality (alt/link clarity) | 0.85 | Needs human review more often |
Update coefficients quarterly from confirmed outcomes. Avoid changing coefficients more than +/-0.1 per cycle.
| Score | Grade | Meaning |
|---|---|---|
| 90-100 | A | Excellent - minor or no issues, meets WCAG AA |
| 75-89 | B | Good - some issues, mostly meets WCAG AA |
| 50-74 | C | Needs Work - multiple issues, partial WCAG AA compliance |
| 25-49 | D | Poor - significant accessibility barriers |
| 0-24 | F | Failing - critical barriers, likely unusable with AT |
| Level | Weight | When to Use |
|---|---|---|
| Confirmed | 120% | Validated by all three sources: axe-core + agent review + Playwright behavioral testing |
| High | 100% | Confirmed by axe-core + agent, or definitively structural (missing alt, no labels, no lang) |
| Medium | 70% | Found by one source, likely issue (heading edge cases, questionable ARIA, possible keyboard traps) |
| Low | 30% | Possible issue, needs human review (alt text quality, reading order, context-dependent link text) |
Issues found by both axe-core AND agent review are automatically upgraded to high confidence regardless of individual confidence ratings.
Issues found by all three sources (axe-core + agent review + Playwright behavioral testing) are upgraded to confirmed confidence with a 1.2x weight multiplier. This applies when:
When Playwright is not available, the maximum achievable confidence remains High (100%). The confirmed tier is additive — it never downgrades findings.
Track predicted confidence versus post-triage outcome and compute drift:
drift = abs(predicted_confidence_score - observed_confirmation_rate)
Operational guideline:
## Accessibility Score
| Metric | Value |
|--------|-------|
| Page | [URL] |
| Score | [0-100] |
| Grade | [A-F] |
| Critical | [count] |
| Serious | [count] |
| Moderate | [count] |
| Minor | [count] |
## Accessibility Scorecard
| Page | Score | Grade | Critical | Serious | Moderate | Minor |
|------|-------|-------|----------|---------|----------|-------|
| / | 82 | B | 0 | 2 | 3 | 1 |
| /login | 91 | A | 0 | 0 | 2 | 1 |
| /dashboard | 45 | D | 2 | 4 | 3 | 2 |
| **Average** | **72.7** | **C** | **2** | **6** | **8** | **4** |
| Pattern Type | Definition | Remediation ROI |
|---|---|---|
| Systemic | Same issue on every audited page | Highest - usually layout/nav, fix once |
| Template | Same issue on pages sharing a component | High - fix the shared component |
| Page-specific | Unique to one page | Normal - fix individually |
| Status | Definition |
|---|---|
| Fixed | Issue was in previous report but no longer present |
| New | Issue not in previous report, appears now |
| Persistent | Issue remains from previous report |
| Regressed | Issue was previously fixed but has returned |
(fixed / previous_total) * 100current_score - previous_scoreWhen audit scope changes between runs, use normalized change:
normalized_score = raw_score - (scope_variance_penalty)
scope_variance_penalty = min(10, abs(previous_pages - current_pages) * 0.8)
Use normalized score for trend charts and use raw score for release gates.
Include these fields in generated score artifacts for reproducibility:
scoring:
model: web-severity-scoring-v2
profile: balanced
calibrationVersion: 2026-q2
confidenceSources:
- axe-core
- agent-review
- playwright
failThresholds:
critical: 1
score: 75
This metadata allows deterministic re-runs and audit-to-audit comparisons.