with one click
web-severity-scoring
// Compute web accessibility scores (0-100, A-F grades) with severity scoring, confidence levels, and remediation tracking across audits.
// Compute web accessibility scores (0-100, A-F grades) with severity scoring, confidence levels, and remediation tracking across audits.
Audit and fix markdown for accessibility. Covers ambiguous links, anchors, emoji (remove/translate), Mermaid/ASCII templates, heading hierarchy, table descriptions, and severity scoring.
Canonical severity level definitions and cross-domain mapping for web, document, and markdown audits. Score impact ranges, WCAG conformance alignment, and cross-format normalization.
Cross-format accessibility rule reference with WCAG 2.2 mapping for Word, Excel, PowerPoint, and PDF documents.
CI/CD accessibility pipeline patterns with axe-core CLI, SARIF output, PR annotations, baseline management, and multi-platform CI templates.
Review UI for cognitive load, plain language clarity, and WCAG 2.2 cognitive SC (3.3.7, 3.3.8, 3.3.9). Includes COGA guidance, reading level, auth patterns, and timeout warnings.
Audit charts and graphs for accessibility: SVG ARIA, data table alternatives, keyboard interaction, color-safe palettes, and library APIs.
| name | web-severity-scoring |
| description | Compute web accessibility scores (0-100, A-F grades) with severity scoring, confidence levels, and remediation tracking across audits. |
Page Score = 100 - (sum of weighted findings)
Weights:
Critical (confirmed, all three sources): -18 points
Critical (high confidence, both sources): -15 points
Critical (high confidence, single source): -10 points
Critical (medium confidence): -7 points
Critical (low confidence): -3 points
Serious (high confidence): -7 points
Serious (medium confidence): -5 points
Serious (low confidence): -2 points
Moderate (high confidence): -3 points
Moderate (medium confidence): -2 points
Moderate (low confidence): -1 point
Minor: -1 point
Floor: 0 (minimum score)
Use a profile to tune strictness by context while keeping comparable grade bands:
| Profile | Intended Use | Multiplier |
|---|---|---|
| balanced (default) | Standard product delivery | 1.0 |
| strict | Regulated/public-sector releases | 1.15 |
| advisory | Early design and prototyping | 0.8 |
Apply the profile multiplier to each final deduction after confidence handling.
page_score = 100
for each finding:
base = lookup(severity, confidence_level, source_count) // from table above
multiplier = 1.2 if confidence_level == "confirmed" else 1.0
deduction = base × multiplier
page_score = max(0, page_score - deduction)
The values in the lookup table above are base deductions (pre-multiplier). "Confirmed" findings (validated by all three sources: axe-core + agent review + Playwright) apply an additional 1.2× multiplier.
Example: One Critical finding at confirmed confidence = 18 (base) × 1.2 = 21.6 points deducted → page score 78.
To reduce false-positive inflation and stabilize trends, apply a calibration coefficient by rule family:
calibrated_deduction = deduction × calibration_coefficient(rule_family)
Recommended initial coefficients:
| Rule Family | Coefficient | Rationale |
|---|---|---|
| Keyboard/focus | 1.1 | High functional impact at runtime |
| Forms/labels/errors | 1.05 | High completion risk for core tasks |
| Semantics/structure | 1.0 | Baseline scoring |
| Link text/context | 0.9 | Higher context variance |
| Content quality (alt/link clarity) | 0.85 | Needs human review more often |
Update coefficients quarterly from confirmed outcomes. Avoid changing coefficients more than +/-0.1 per cycle.
| Score | Grade | Meaning |
|---|---|---|
| 90-100 | A | Excellent - minor or no issues, meets WCAG AA |
| 75-89 | B | Good - some issues, mostly meets WCAG AA |
| 50-74 | C | Needs Work - multiple issues, partial WCAG AA compliance |
| 25-49 | D | Poor - significant accessibility barriers |
| 0-24 | F | Failing - critical barriers, likely unusable with AT |
| Level | Weight | When to Use |
|---|---|---|
| Confirmed | 120% | Validated by all three sources: axe-core + agent review + Playwright behavioral testing |
| High | 100% | Confirmed by axe-core + agent, or definitively structural (missing alt, no labels, no lang) |
| Medium | 70% | Found by one source, likely issue (heading edge cases, questionable ARIA, possible keyboard traps) |
| Low | 30% | Possible issue, needs human review (alt text quality, reading order, context-dependent link text) |
Issues found by both axe-core AND agent review are automatically upgraded to high confidence regardless of individual confidence ratings.
Issues found by all three sources (axe-core + agent review + Playwright behavioral testing) are upgraded to confirmed confidence with a 1.2x weight multiplier. This applies when:
When Playwright is not available, the maximum achievable confidence remains High (100%). The confirmed tier is additive — it never downgrades findings.
Track predicted confidence versus post-triage outcome and compute drift:
drift = abs(predicted_confidence_score - observed_confirmation_rate)
Operational guideline:
## Accessibility Score
| Metric | Value |
|--------|-------|
| Page | [URL] |
| Score | [0-100] |
| Grade | [A-F] |
| Critical | [count] |
| Serious | [count] |
| Moderate | [count] |
| Minor | [count] |
## Accessibility Scorecard
| Page | Score | Grade | Critical | Serious | Moderate | Minor |
|------|-------|-------|----------|---------|----------|-------|
| / | 82 | B | 0 | 2 | 3 | 1 |
| /login | 91 | A | 0 | 0 | 2 | 1 |
| /dashboard | 45 | D | 2 | 4 | 3 | 2 |
| **Average** | **72.7** | **C** | **2** | **6** | **8** | **4** |
| Pattern Type | Definition | Remediation ROI |
|---|---|---|
| Systemic | Same issue on every audited page | Highest - usually layout/nav, fix once |
| Template | Same issue on pages sharing a component | High - fix the shared component |
| Page-specific | Unique to one page | Normal - fix individually |
| Status | Definition |
|---|---|
| Fixed | Issue was in previous report but no longer present |
| New | Issue not in previous report, appears now |
| Persistent | Issue remains from previous report |
| Regressed | Issue was previously fixed but has returned |
(fixed / previous_total) * 100current_score - previous_scoreWhen audit scope changes between runs, use normalized change:
normalized_score = raw_score - (scope_variance_penalty)
scope_variance_penalty = min(10, abs(previous_pages - current_pages) * 0.8)
Use normalized score for trend charts and use raw score for release gates.
Include these fields in generated score artifacts for reproducibility:
scoring:
model: web-severity-scoring-v2
profile: balanced
calibrationVersion: 2026-q2
confidenceSources:
- axe-core
- agent-review
- playwright
failThresholds:
critical: 1
score: 75
This metadata allows deterministic re-runs and audit-to-audit comparisons.