with one click
analyze-vtt-docs
// Generates EXHAUSTIVE WebVTT specification summary from web sources with complete rule coverage, all tags/settings/entities, and self-validation.
// Generates EXHAUSTIVE WebVTT specification summary from web sources with complete rule coverage, all tags/settings/entities, and self-validation.
Generates EXHAUSTIVE DFXP/TTML specification summary from web sources with complete rule coverage, all elements/attributes/styling, and self-validation.
Analyzes and validates comprehensive SCC specification coverage, ensuring all rules, formats, and best practices are documented with automated verification.
Generates EXHAUSTIVE DFXP/TTML compliance report checking all 115 rules individually + styling/timing/element coverage with deep validation analysis to identify ALL issues in pycaption code.
Comprehensive PR analysis for merge decisions - compliance, code review, regressions, and test coverage
Generates EXHAUSTIVE compliance report checking all 44 SCC rules (34 RULE + 10 IMPL) individually + 704 control codes with 12 deep validations (cross-mode EDM, zero-value truthiness, silent error suppression, read-only styling, position fallback) to identify ALL issues in pycaption code.
Generates EXHAUSTIVE WebVTT compliance report checking all 76 rules individually + tag/setting/entity coverage with deep validation analysis to identify ALL issues in pycaption code.
| name | analyze-vtt-docs |
| description | Generates EXHAUSTIVE WebVTT specification summary from web sources with complete rule coverage, all tags/settings/entities, and self-validation. |
Generates comprehensive, exhaustive WebVTT specification (vtt_specs_summary.md) as single source of truth for compliance checking.
Outputs:
Key: Ensures NO requirements missed - exhaustive coverage from W3C spec + MDN + web search.
Pre-flight: Read .claude/skills/gotchas.md before generating specs. Pay special attention to gotcha #3 (W3C license attribution required).
Post-run: If you discover a new gotcha during spec generation (a copyright/licensing trap, a W3C attribution pattern that should be avoided, a web source that returns misleading data, or a spec structure issue that could cause downstream compliance check failures), append it to .claude/skills/gotchas.md with the same numbered format.
Usage:
/analyze-vtt-docs
Single command - fetches web sources, performs comprehensive analysis, generates complete spec.
Read existing documentation:
# Check what we already have
ls -la ai_artifacts/specs/vtt/
cat ai_artifacts/specs/vtt/vtt_web_sources.md
If vtt_specs_summary.md exists:
IMPORTANT: This step requires the WebFetch tool to be loaded first.
Check if WebFetch is available, load if needed:
# WebFetch is a deferred tool - load it before use
# Use ToolSearch to load WebFetch
Read URLs from ai_artifacts/specs/vtt/vtt_web_sources.md:
import re
with open("ai_artifacts/specs/vtt/vtt_web_sources.md") as _f:
sources_content = _f.read()
# Extract URLs from markdown links: [Text](URL)
url_pattern = r'\[([^\]]+)\]\(([^)]+)\)'
existing_sources = []
for match in re.findall(url_pattern, sources_content):
title, url = match
existing_sources.append({'title': title, 'url': url})
print(f"Found {len(existing_sources)} existing sources")
for s in existing_sources:
print(f" - {s['title']}")
Fetch W3C WebVTT Specification (Primary Source):
# Fetch W3C spec - most authoritative source
w3c_url = 'https://www.w3.org/TR/webvtt1/'
print("Fetching W3C WebVTT Specification...")
# Use the WebFetch tool to fetch w3c_url
# Store result in a variable for processing
# w3c_content = <result from WebFetch tool>
Fetch MDN Documentation (Supplementary):
# MDN provides practical examples and browser compatibility info
mdn_url = 'https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API'
print("Fetching MDN WebVTT Documentation...")
# Use the WebFetch tool to fetch mdn_url
# mdn_content = <result from WebFetch tool>
Context optimization:
Perform targeted web searches to fill gaps:
# Define search queries for comprehensive coverage
search_queries = [
"WebVTT specification complete W3C",
"WebVTT cue settings all options",
"WebVTT markup tags complete list",
"WebVTT HTML entities supported",
"WebVTT REGION block specification",
"WebVTT STYLE block CSS",
"WebVTT NOTE comment syntax",
"WebVTT timestamp format validation",
"WebVTT best practices implementation",
"WebVTT validation rules MUST SHOULD",
]
# Execute searches and collect results
search_results = []
for query in search_queries:
print(f"Searching: {query}")
# Use the WebSearch tool for each query
results = [] # populated by WebSearch tool
search_results.append({
'query': query,
'results': results
})
# Brief delay to avoid rate limiting
Identify high-value sources from search results:
import re
# Re-read existing sources (each block is independent)
with open("ai_artifacts/specs/vtt/vtt_web_sources.md") as _f:
_sources_content = _f.read()
existing_sources = [
{'title': m[0], 'url': m[1]}
for m in re.findall(r'\[([^\]]+)\]\(([^)]+)\)', _sources_content)
]
# Agent: for each URL found in the search step above, check if it is
# authoritative (w3.org, developer.mozilla.org, github.com/w3c) and not
# already in existing_sources. Collect matches into new_sources list:
_existing_urls = {s['url'] for s in existing_sources}
new_sources = [] # Agent fills this from search results
# new_sources.append({'title': <title>, 'url': <url>, 'query': <query>})
print(f"\nFound {len(new_sources)} new authoritative sources")
Fetch new sources:
# Agent: for each source in new_sources (up to 5), use WebFetch to
# retrieve the content. new_sources was built in the filtering step above.
# for source in new_sources[:5]:
# print(f"Fetching: {source['title']}")
# # Use the WebFetch tool with url=source['url']
CRITICAL: Verify ALL these areas covered in fetched content (100% coverage required):
File Format:
Timestamp Format:
[HH:]MM:SS.mmm (hours optional if < 1 hour)--> (spaces required)Cue Structure:
Cue Settings:
Tags (Markup):
<c.classname>text</c> (multiple classes: <c.class1.class2>)<i>text</i><b>text</b><u>text</u><ruby>base<rt>annotation</rt></ruby><v Speaker>text</v> (optional annotation)<lang code>text</lang><00:01:23.456> (karaoke-style)Regions (Optional Feature):
region:id settingSpecial Blocks:
Validation Requirements:
Edge Cases & Common Pitfalls:
Implementation Requirements:
Browser Compatibility:
Completeness Checklist (MUST achieve 100%):
# TEMPLATE: All values start as False. Update each to True as you confirm
# coverage during spec generation. Re-run this block to check progress.
completeness_check = {
'file_format': {
'header': False, # WEBVTT signature
'encoding': False, # UTF-8
'bom': False, # BOM handling
'line_endings': False, # CR/LF/CRLF
'blank_line': False, # After header
},
'timestamps': {
'format': False, # [HH:]MM:SS.mmm
'validation': False, # Start <= end
'ranges': False, # MM/SS 00-59
'milliseconds': False, # Exactly 3 digits
'separator': False, # ` --> `
},
'cue_settings': {
'vertical': False, # rl/lr
'line': False, # N or N%
'position': False, # N%
'size': False, # N%
'align': False, # start/center/end/left/right
'region': False, # region_id
},
'markup_tags': {
'class_span': False, # <c>
'italics': False, # <i>
'bold': False, # <b>
'underline': False, # <u>
'voice': False, # <v>
'language': False, # <lang>
'ruby': False, # <ruby><rt>
'timestamp': False, # <00:01:23.456>
},
'html_entities': {
'required': False, # & < > ‎ ‏
'escaping': False, # Escape rules
},
'regions': {
'region_block': False, # REGION definition
'properties': False, # id/width/lines/anchors/scroll
},
'special_blocks': {
'note': False, # NOTE comments
'style': False, # STYLE CSS
},
'validation': {
'must_rules': False, # All MUST requirements
'should_rules': False, # All SHOULD requirements
'error_handling': False, # Error strategies
},
}
# Calculate completeness percentage
total_items = sum(len(v) for v in completeness_check.values())
covered_items = sum(sum(v.values()) for v in completeness_check.values())
completeness = (covered_items / total_items) * 100
print(f"Completeness: {completeness:.1f}% ({covered_items}/{total_items} items)")
if completeness < 100:
print("Missing items - additional web search required")
# List what's missing
for category, items in completeness_check.items():
missing = [k for k, v in items.items() if not v]
if missing:
print(f" {category}: {', '.join(missing)}")
If new sources found during search, update vtt_web_sources.md:
# Agent: if you discovered new sources during the search/filter steps,
# append them to vtt_web_sources.md now. For each new source URL not
# already in the file, add a markdown link line.
import re as _re, os
_sources_path = "ai_artifacts/specs/vtt/vtt_web_sources.md"
if os.path.exists(_sources_path):
with open(_sources_path) as _f:
_current = _f.read()
_known_urls = {m[1] for m in _re.findall(r'\[([^\]]+)\]\(([^)]+)\)', _current)}
# Agent: for each new source discovered above, if url not in _known_urls:
# _current += f"- [{title}]({url})\n"
# Then write back:
# with open(_sources_path, "w") as _f:
# _f.write(_current)
print("Source file update complete")
else:
print(f"WARNING: {_sources_path} not found — skipping source update")
Create ai_artifacts/specs/vtt/vtt_specs_summary.md using the rule format below.
Key differences from old approach:
Rule Format:
**[RULE-XXX-###]** Brief requirement
- **Requirement:** What must be true
- **Level:** MUST | SHOULD | MAY | MUST NOT
- **Validation:** How to check
- **Test Pattern:** Regex or algorithm
- **Sources:** [Attribution]
Implementation Rule Format (GENERIC):
**[IMPL-XXX-###]** Component MUST do X
- **Spec Rule:** RULE-XXX-###
- **Component:** Parser | Writer | Validator
- **Implementation Requirement:** What ANY compliant implementation must do
- **Expected Behavior:** Input → Output examples
- **Validation Criteria:** What to verify
- **Common Patterns:** Correct vs incorrect (generic)
- **Test Coverage:** Required test scenarios
Critical requirements (must be included as rules):
Part 1 (File Format): Header format, UTF-8, BOM handling, blank line after header
Part 2 (Timestamps): Format [HH:]MM:SS.mmm, ranges, start<=end, sequential
Part 3 (Cue Structure): Identifier restrictions, --> separator, blank line terminator
Part 4 (Cue Settings): vertical, line, position, size, align, region (6 settings)
Part 5 (Tags): c, i, b, u, v, lang, ruby, timestamp (8 tags), closing rules, escaping
Part 6 (Regions): REGION block, id/width/lines/regionanchor/viewportanchor/scroll
Part 7 (Special Blocks): NOTE (comments), STYLE (CSS)
Part 8 (Implementation): Generic IMPL-* rules for Parser/Writer/Validator
Part 9 (Validation Summary): Rule counts, self-validation report
Part 10 (Quick Reference): Tables for settings and tags
Target Rule Counts (Exhaustive):
Level Distribution (Exhaustive):
Critical Inclusions (MUST be documented):
All 8 Markup Tags (Individual Rules):
<c> / <c.class> - Class spans (RULE-TAG-001)<i> - Italics (RULE-TAG-002)<b> - Bold (RULE-TAG-003)<u> - Underline (RULE-TAG-004)<v> - Voice/speaker (RULE-TAG-005)<lang> - Language (RULE-TAG-006)<ruby><rt> - Ruby text (RULE-TAG-007)<HH:MM:SS.mmm> - Internal timestamp (RULE-TAG-008)All 6 Cue Settings (Individual Rules):
All Required HTML Entities (Individual Rules):
REGION Properties (Individual Rules):
Generate spec with incremental writing (context-efficient):
from datetime import datetime
import os
os.makedirs("ai_artifacts/specs/vtt", exist_ok=True)
spec_path = "ai_artifacts/specs/vtt/vtt_specs_summary.md"
# Write spec header
spec_content = f"""# WebVTT Specification - Complete Reference
**Generated**: {datetime.now().strftime("%Y-%m-%d")}
**Sources**: W3C WebVTT Specification (https://www.w3.org/TR/webvtt1/), MDN Web Docs
**Version**: W3C Candidate Recommendation
**Total Rules**: [TO BE CALCULATED]
---
"""
with open(spec_path, "w") as _f:
_f.write(spec_content)
# Then generate and append each part section by section:
# Part 1: File Format rules
# Part 2: Timestamp rules
# ... continue for all parts (Parts 1-10)
# Append each part with: with open(spec_path, "a") as _f: _f.write(part)
Structure checks:
Content checks (Exhaustive - 100% required):
Generate exhaustive validation report in spec file:
## Part 10: Exhaustive Validation Summary
### Rule Counts by Category
- RULE-FMT-###: X file format rules (Target: 5-7)
- RULE-TIME-###: X timestamp rules (Target: 7-10)
- RULE-CUE-###: X cue structure rules (Target: 5-8)
- RULE-SET-###: X cue setting rules (Target: 8 - ALL settings)
- RULE-TAG-###: X tag/markup rules (Target: 11-15 - ALL 8 tags + rules)
- RULE-ENT-###: X HTML entity rules (Target: 3-5 - ALL 6 entities)
- RULE-REG-###: X region rules (Target: 5-8 - ALL 6 properties)
- RULE-BLK-###: X special block rules (Target: 3-5)
- RULE-VAL-###: X validation rules (Target: 5-8)
- IMPL-###: X implementation requirements (Target: 12-15)
- **Total: Y rules** (Target: 60-80 for exhaustive coverage)
### By Level (Exhaustive Distribution)
- MUST: X rules (Target: 30-40)
- SHOULD: X rules (Target: 15-20)
- MAY: X rules (Target: 5-10)
- MUST NOT: X rules (Target: 3-5)
### Coverage Verification (100% Required)
**Markup Tags (8 total - ALL must be documented):**
- ✅/❌ `<c>` class spans (RULE-TAG-001)
- ✅/❌ `<i>` italics (RULE-TAG-002)
- ✅/❌ `<b>` bold (RULE-TAG-003)
- ✅/❌ `<u>` underline (RULE-TAG-004)
- ✅/❌ `<v>` voice (RULE-TAG-005)
- ✅/❌ `<lang>` language (RULE-TAG-006)
- ✅/❌ `<ruby><rt>` ruby text (RULE-TAG-007)
- ✅/❌ `<HH:MM:SS.mmm>` timestamp (RULE-TAG-008)
**Status: X/8 tags documented**
**Cue Settings (6 total - ALL must be documented):**
- ✅/❌ vertical: rl|lr (RULE-SET-001)
- ✅/❌ line: N|N% (RULE-SET-002)
- ✅/❌ position: N% (RULE-SET-003)
- ✅/❌ size: N% (RULE-SET-004)
- ✅/❌ align: start|center|end|left|right (RULE-SET-005)
- ✅/❌ region: id (RULE-SET-006)
**Status: X/6 settings documented**
**HTML Entities (6 required - ALL must be documented):**
- ✅/❌ & ampersand (RULE-ENT-001)
- ✅/❌ < less than (RULE-ENT-002)
- ✅/❌ > greater than (RULE-ENT-003)
- ✅/❌ non-breaking space (RULE-ENT-004)
- ✅/❌ ‎ left-to-right mark (RULE-ENT-005)
- ✅/❌ ‏ right-to-left mark (RULE-ENT-006)
**Status: X/6 entities documented**
**REGION Properties (6 total - ALL must be documented):**
- ✅/❌ id (required) (RULE-REG-001)
- ✅/❌ width: N% (RULE-REG-002)
- ✅/❌ lines: N (RULE-REG-003)
- ✅/❌ regionanchor: X%,Y% (RULE-REG-004)
- ✅/❌ viewportanchor: X%,Y% (RULE-REG-005)
- ✅/❌ scroll: up|none (RULE-REG-006)
**Status: X/6 properties documented**
### Self-Validation Checklist
- ✅/❌ All rule IDs unique
- ✅/❌ Sequential numbering within categories
- ✅/❌ All 8 markup tags individually documented
- ✅/❌ All 6 cue settings individually documented
- ✅/❌ All 6 HTML entities individually documented
- ✅/❌ All 6 REGION properties individually documented
- ✅/❌ Generic IMPL rules (no pycaption-specific code)
- ✅/❌ Test patterns present for all rules
- ✅/❌ Source attribution present
- ✅/❌ 60-80 total rules (exhaustive coverage target)
- ✅/❌ 30-40 MUST rules documented
### Overall Status
- **Completeness**: X% (100% required)
- **Status**: ✅ PASS | ❌ FAIL (requires fixes)
**If FAIL**: Missing items listed above must be added before spec is complete.
If validation FAILS:
Track sources for each rule:
Document conflicts and resolutions.
Append new URLs (if any) to ai_artifacts/specs/vtt/vtt_web_sources.md:
- [New Source Title](https://url.example.com)
CRITICAL: After generating the spec, run this validation script. If it reports FAIL, fix the spec and re-run until PASS.
import re
print("=" * 60)
print("POST-GENERATION VALIDATION: WebVTT")
print("Checking vtt_specs_summary.md against master_checklist.md")
print("=" * 60)
with open('ai_artifacts/specs/vtt/master_checklist.md') as _f:
checklist = _f.read()
with open('ai_artifacts/specs/vtt/vtt_specs_summary.md') as _f:
spec = _f.read()
failures = []
warnings = []
# 1. Check all required rule IDs
rule_ids = re.findall(r'^- ((?:RULE|IMPL)-[A-Z]+-\d{3})', checklist, re.M)
for rid in rule_ids:
if rid not in spec:
failures.append(f"MISSING RULE: {rid}")
found_rules = len(rule_ids) - len([f for f in failures if 'MISSING RULE' in f])
print(f"[1/6] Rule IDs: {found_rules}/{len(rule_ids)}")
# 2. Check required tags
tags_section = re.search(r'## Required Tags.*?\n((?:- .+\n)+)', checklist)
if tags_section:
tags = re.findall(r'^- `(.+?)`', tags_section.group(1), re.M)
for tag in tags:
# Search for the tag in spec (handle angle brackets)
tag_clean = tag.replace('<', '').replace('>', '').split('/')[0].split('.')[0]
if not re.search(rf'<{re.escape(tag_clean)}[>\s./]', spec):
if not re.search(re.escape(tag_clean), spec, re.I):
failures.append(f"MISSING TAG: {tag}")
print(f"[2/6] Tags: {len(tags) - len([f for f in failures if 'TAG' in f])}/{len(tags)}")
# 3. Check required settings
settings_section = re.search(r'## Required Cue Settings.*?\n((?:- .+\n)+)', checklist)
if settings_section:
settings = re.findall(r'^- (\w+):', settings_section.group(1), re.M)
for setting in settings:
if not re.search(rf'\b{re.escape(setting)}\b', spec):
failures.append(f"MISSING SETTING: {setting}")
print(f"[3/6] Settings: {len(settings) - len([f for f in failures if 'SETTING' in f])}/{len(settings)}")
# 4. Check required entities
entities_section = re.search(r'## Required HTML Entities.*?\n((?:- .+\n)+)', checklist)
if entities_section:
entities = re.findall(r'^- (.+?)$', entities_section.group(1), re.M)
for entity in entities:
entity_clean = entity.strip().split(' ')[0]
if entity_clean not in spec:
if not re.search(re.escape(entity_clean), spec):
warnings.append(f"MISSING ENTITY: {entity_clean}")
print(f"[4/6] Entities: checked {len(entities)}")
# 5. Check required enum values
enum_sections = re.findall(r'### (.+?)\n((?:- .+\n)+)', checklist)
missing_enums = 0
total_enums = 0
for section_name, values_block in enum_sections:
values = re.findall(r'^- (.+)$', values_block, re.M)
for val in values:
val_clean = val.strip()
total_enums += 1
if val_clean not in spec:
if not re.search(re.escape(val_clean), spec, re.I):
missing_enums += 1
warnings.append(f"MISSING ENUM [{section_name}]: {val_clean}")
print(f"[5/6] Enum values: {total_enums - missing_enums}/{total_enums}")
# 6. Check severity distribution
severity_section = re.search(r'## Required Severity Distribution\n((?:.*\n)*)', checklist)
if severity_section:
for match in re.finditer(r'- (MUST|SHOULD|MAY|MUST NOT): (\d+)', severity_section.group(1)):
level, minimum = match.group(1), int(match.group(2))
actual = len(re.findall(rf'Level:\*\*\s*{re.escape(level)}\b', spec))
if actual < minimum:
failures.append(f"SEVERITY {level}: found {actual}, need >= {minimum}")
print(f"[6/6] {level}: {actual} (min {minimum}) {'PASS' if actual >= minimum else 'FAIL'}")
# Report
print("\n" + "=" * 60)
if failures:
print(f"FAIL: {len(failures)} failures, {len(warnings)} warnings\n")
for f in failures:
print(f" FAIL: {f}")
for w in warnings[:10]:
print(f" WARN: {w}")
if len(warnings) > 10:
print(f" ... and {len(warnings) - 10} more warnings")
print("\nFix the spec and re-run this validation.")
else:
print(f"PASS: All checks passed ({len(warnings)} warnings)")
for w in warnings[:10]:
print(f" WARN: {w}")
print("=" * 60)
If FAIL: Fix the missing items in the spec, then re-run the validation script. Repeat until PASS.
ai_artifacts/specs/vtt/vtt_specs_summary.md - Complete specification with 60-80 rulesai_artifacts/specs/vtt/vtt_web_sources.md - Updated URL list (if new sources found)Master Checklist Validation (CRITICAL - must PASS):
master_checklist.md present in generated specCompleteness:
Quality:
Web Sources:
Token usage target: < 50K per invocation
Strategies:
Estimated token usage: