Run any Skill in Manus with one click

$pwd:

analyze-vtt-docs

Name: Analyze Vtt Docs
Author: pbs

// Generates EXHAUSTIVE WebVTT specification summary from web sources with complete rule coverage, all tags/settings/entities, and self-validation.

Run Skill in Manus

$ git log --oneline --stat

stars:272

forks:141

updated:April 30, 2026 at 10:47

SKILL.md

readonly

related-skills.json

same repository

analyze-dfxp-docs.md

from "pbs/pycaption"

Generates EXHAUSTIVE DFXP/TTML specification summary from web sources with complete rule coverage, all elements/attributes/styling, and self-validation.

2026-04-30272

analyze-scc-docs.md

from "pbs/pycaption"

Analyzes and validates comprehensive SCC specification coverage, ensuring all rules, formats, and best practices are documented with automated verification.

2026-04-30272

check-dfxp-compliance.md

from "pbs/pycaption"

Generates EXHAUSTIVE DFXP/TTML compliance report checking all 115 rules individually + styling/timing/element coverage with deep validation analysis to identify ALL issues in pycaption code.

2026-04-30272

check-last-pr.md

from "pbs/pycaption"

Comprehensive PR analysis for merge decisions - compliance, code review, regressions, and test coverage

2026-04-30272

check-scc-compliance.md

from "pbs/pycaption"

Generates EXHAUSTIVE compliance report checking all 44 SCC rules (34 RULE + 10 IMPL) individually + 704 control codes with 12 deep validations (cross-mode EDM, zero-value truthiness, silent error suppression, read-only styling, position fallback) to identify ALL issues in pycaption code.

2026-04-30272

check-vtt-compliance.md

from "pbs/pycaption"

Generates EXHAUSTIVE WebVTT compliance report checking all 76 rules individually + tag/setting/entity coverage with deep validation analysis to identify ALL issues in pycaption code.

2026-04-30272

package.json

"author": "pbs"

"repository": "pbs/pycaption"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

Run any Skill with one click

name	analyze-vtt-docs
description	Generates EXHAUSTIVE WebVTT specification summary from web sources with complete rule coverage, all tags/settings/entities, and self-validation.

analyze-vtt-docs

What this skill does

Generates comprehensive, exhaustive WebVTT specification (vtt_specs_summary.md) as single source of truth for compliance checking.

Outputs:

50+ RULE-XXX specifications with unique IDs and test patterns
12+ IMPL-XXX requirements (generic, no pycaption references)
All 8 markup tags individually documented (c, i, b, u, v, lang, ruby, timestamp)
All 8 cue settings individually documented (vertical, line, position, size, align, region, etc.)
All required HTML entities (&, <, >, , ‎, ‏)
Region specifications complete (REGION block properties)
STYLE/NOTE blocks documented
Self-validation report (rule counts, completeness check)
Source attribution per rule

Key: Ensures NO requirements missed - exhaustive coverage from W3C spec + MDN + web search.

Pre-flight: Read .claude/skills/gotchas.md before generating specs. Pay special attention to gotcha #3 (W3C license attribution required).

Post-run: If you discover a new gotcha during spec generation (a copyright/licensing trap, a W3C attribution pattern that should be avoided, a web source that returns misleading data, or a spec structure issue that could cause downstream compliance check failures), append it to .claude/skills/gotchas.md with the same numbered format.

Usage:

/analyze-vtt-docs

Single command - fetches web sources, performs comprehensive analysis, generates complete spec.

Implementation

Step 0: Check Existing Sources

Read existing documentation:

# Check what we already have
ls -la ai_artifacts/specs/vtt/
cat ai_artifacts/specs/vtt/vtt_web_sources.md

If vtt_specs_summary.md exists:

Read it to assess completeness
Identify gaps using completeness checklist (Step 2)
Only fetch new sources if gaps exist

Step 1: Fetch Known Web Sources (WebFetch Tool Required)

IMPORTANT: This step requires the WebFetch tool to be loaded first.

Check if WebFetch is available, load if needed:

# WebFetch is a deferred tool - load it before use
# Use ToolSearch to load WebFetch

Read URLs from ai_artifacts/specs/vtt/vtt_web_sources.md:

import re

with open("ai_artifacts/specs/vtt/vtt_web_sources.md") as _f:
    sources_content = _f.read()

# Extract URLs from markdown links: [Text](URL)
url_pattern = r'\[([^\]]+)\]\(([^)]+)\)'
existing_sources = []

for match in re.findall(url_pattern, sources_content):
    title, url = match
    existing_sources.append({'title': title, 'url': url})

print(f"Found {len(existing_sources)} existing sources")
for s in existing_sources:
    print(f"   - {s['title']}")

Fetch W3C WebVTT Specification (Primary Source):

# Fetch W3C spec - most authoritative source
w3c_url = 'https://www.w3.org/TR/webvtt1/'
print("Fetching W3C WebVTT Specification...")

# Use the WebFetch tool to fetch w3c_url
# Store result in a variable for processing
# w3c_content = <result from WebFetch tool>

Fetch MDN Documentation (Supplementary):

# MDN provides practical examples and browser compatibility info
mdn_url = 'https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API'
print("Fetching MDN WebVTT Documentation...")

# Use the WebFetch tool to fetch mdn_url
# mdn_content = <result from WebFetch tool>

Context optimization:

Fetch sources sequentially, not in parallel (avoid context overflow)
Extract text content only, discard HTML tags
Focus on specification sections
Save to temp files, don't hold in memory

Step 2: Comprehensive Web Search for Missing Details

Perform targeted web searches to fill gaps:

# Define search queries for comprehensive coverage
search_queries = [
    "WebVTT specification complete W3C",
    "WebVTT cue settings all options",
    "WebVTT markup tags complete list",
    "WebVTT HTML entities supported",
    "WebVTT REGION block specification",
    "WebVTT STYLE block CSS",
    "WebVTT NOTE comment syntax",
    "WebVTT timestamp format validation",
    "WebVTT best practices implementation",
    "WebVTT validation rules MUST SHOULD",
]

# Execute searches and collect results
search_results = []
for query in search_queries:
    print(f"Searching: {query}")
    # Use the WebSearch tool for each query
    results = []  # populated by WebSearch tool
    search_results.append({
        'query': query,
        'results': results
    })
    # Brief delay to avoid rate limiting

Identify high-value sources from search results:

import re

# Re-read existing sources (each block is independent)
with open("ai_artifacts/specs/vtt/vtt_web_sources.md") as _f:
    _sources_content = _f.read()
existing_sources = [
    {'title': m[0], 'url': m[1]}
    for m in re.findall(r'\[([^\]]+)\]\(([^)]+)\)', _sources_content)
]

# Agent: for each URL found in the search step above, check if it is
# authoritative (w3.org, developer.mozilla.org, github.com/w3c) and not
# already in existing_sources. Collect matches into new_sources list:
_existing_urls = {s['url'] for s in existing_sources}
new_sources = []  # Agent fills this from search results
# new_sources.append({'title': <title>, 'url': <url>, 'query': <query>})

print(f"\nFound {len(new_sources)} new authoritative sources")

Fetch new sources:

# Agent: for each source in new_sources (up to 5), use WebFetch to
# retrieve the content. new_sources was built in the filtering step above.
# for source in new_sources[:5]:
#     print(f"Fetching: {source['title']}")
#     # Use the WebFetch tool with url=source['url']

Step 3: Exhaustive Completeness Verification

CRITICAL: Verify ALL these areas covered in fetched content (100% coverage required):

File Format:

Header: "WEBVTT" exact match (case-sensitive), optional space + comment
UTF-8 encoding requirement (MUST)
Optional UTF-8 BOM handling
Line endings: CR, LF, CRLF all valid
Blank line after header before first cue

Timestamp Format:

Format: [HH:]MM:SS.mmm (hours optional if < 1 hour)
Milliseconds required (3 digits)
Separator: --> (spaces required)
Start time <= end time (MUST)
Sequential ordering (SHOULD)
Valid ranges: HH (00-99), MM (00-59), SS (00-59), mmm (000-999)

Cue Structure:

Optional cue identifier (any text except "-->", "NOTE", or looks like timestamp)
Required: start --> end [optional settings]
Cue payload (can span multiple lines)
Blank line terminates cue

Cue Settings:

vertical: rl, lr (text direction)
line: N or N% (vertical position, can be negative)
position: N% (horizontal position 0-100)
size: N% (cue box width 0-100)
align: start, center, end, left, right
region: region_id (reference to defined region)

Tags (Markup):

Class spans: <c.classname>text</c> (multiple classes: <c.class1.class2>)
Italics: text
Bold: text
Underline: text
Ruby: <ruby>base<rt>annotation</rt></ruby>
Voice: <v Speaker>text</v> (optional annotation)
Language: <lang code>text</lang>
Internal timestamps: <00:01:23.456> (karaoke-style)
Tag nesting rules and restrictions
Escape sequences: & < > ‎ ‏

Regions (Optional Feature):

REGION block definition before cues
Properties: id, width, lines, regionanchor, viewportanchor, scroll
Association with cues via region:id setting

Special Blocks:

NOTE blocks (comments, ignored by parser)
STYLE blocks (CSS for cue pseudo-elements)
Syntax and placement rules

Validation Requirements:

All MUST requirements from W3C spec
All SHOULD requirements
All MAY optional features
All MUST NOT forbidden patterns
Error handling strategies

Edge Cases & Common Pitfalls:

Extra text on first line after "WEBVTT"
Missing milliseconds in timestamps
Missing spaces around -->
Invalid cue settings
Unclosed tags
Un-escaped special characters
Percentage out of range (0-100)
Start > end time
Invalid UTF-8 sequences

Implementation Requirements:

Parser requirements (UTF-8 decoder, timestamp parser, tag parser, settings parser)
Writer requirements (UTF-8 encoder, escaping, formatting)
Error handling strategies
Performance considerations

Browser Compatibility:

Feature support across browsers
Cue settings support
Region support (limited)
STYLE block support (varies)
Graceful degradation

Completeness Checklist (MUST achieve 100%):

# TEMPLATE: All values start as False. Update each to True as you confirm
# coverage during spec generation. Re-run this block to check progress.
completeness_check = {
    'file_format': {
        'header': False,  # WEBVTT signature
        'encoding': False,  # UTF-8
        'bom': False,  # BOM handling
        'line_endings': False,  # CR/LF/CRLF
        'blank_line': False,  # After header
    },
    'timestamps': {
        'format': False,  # [HH:]MM:SS.mmm
        'validation': False,  # Start <= end
        'ranges': False,  # MM/SS 00-59
        'milliseconds': False,  # Exactly 3 digits
        'separator': False,  # ` --> `
    },
    'cue_settings': {
        'vertical': False,  # rl/lr
        'line': False,  # N or N%
        'position': False,  # N%
        'size': False,  # N%
        'align': False,  # start/center/end/left/right
        'region': False,  # region_id
    },
    'markup_tags': {
        'class_span': False,  # <c>
        'italics': False,  # <i>
        'bold': False,  # <b>
        'underline': False,  # <u>
        'voice': False,  # <v>
        'language': False,  # <lang>
        'ruby': False,  # <ruby><rt>
        'timestamp': False,  # <00:01:23.456>
    },
    'html_entities': {
        'required': False,  # &amp; &lt; &gt; &nbsp; &lrm; &rlm;
        'escaping': False,  # Escape rules
    },
    'regions': {
        'region_block': False,  # REGION definition
        'properties': False,  # id/width/lines/anchors/scroll
    },
    'special_blocks': {
        'note': False,  # NOTE comments
        'style': False,  # STYLE CSS
    },
    'validation': {
        'must_rules': False,  # All MUST requirements
        'should_rules': False,  # All SHOULD requirements
        'error_handling': False,  # Error strategies
    },
}

# Calculate completeness percentage
total_items = sum(len(v) for v in completeness_check.values())
covered_items = sum(sum(v.values()) for v in completeness_check.values())
completeness = (covered_items / total_items) * 100

print(f"Completeness: {completeness:.1f}% ({covered_items}/{total_items} items)")

if completeness < 100:
    print("Missing items - additional web search required")
    # List what's missing
    for category, items in completeness_check.items():
        missing = [k for k, v in items.items() if not v]
        if missing:
            print(f"   {category}: {', '.join(missing)}")

If new sources found during search, update vtt_web_sources.md:

# Agent: if you discovered new sources during the search/filter steps,
# append them to vtt_web_sources.md now. For each new source URL not
# already in the file, add a markdown link line.
import re as _re, os
_sources_path = "ai_artifacts/specs/vtt/vtt_web_sources.md"
if os.path.exists(_sources_path):
    with open(_sources_path) as _f:
        _current = _f.read()
    _known_urls = {m[1] for m in _re.findall(r'\[([^\]]+)\]\(([^)]+)\)', _current)}
    # Agent: for each new source discovered above, if url not in _known_urls:
    #   _current += f"- [{title}]({url})\n"
    # Then write back:
    # with open(_sources_path, "w") as _f:
    #     _f.write(_current)
    print("Source file update complete")
else:
    print(f"WARNING: {_sources_path} not found — skipping source update")

Step 4: Generate Exhaustive Specification

Create ai_artifacts/specs/vtt/vtt_specs_summary.md using the rule format below.

Key differences from old approach:

Rule-based format with unique IDs (RULE-FMT-###, RULE-TIME-###, etc.)
Generic IMPL-### rules (no pycaption-specific code references)
Test patterns for automated validation
Level indicators (MUST/SHOULD/MAY/MUST NOT)
Source attribution per rule

Rule Format:

**[RULE-XXX-###]** Brief requirement
- **Requirement:** What must be true
- **Level:** MUST | SHOULD | MAY | MUST NOT
- **Validation:** How to check
- **Test Pattern:** Regex or algorithm
- **Sources:** [Attribution]

Implementation Rule Format (GENERIC):

**[IMPL-XXX-###]** Component MUST do X
- **Spec Rule:** RULE-XXX-###
- **Component:** Parser | Writer | Validator
- **Implementation Requirement:** What ANY compliant implementation must do
- **Expected Behavior:** Input → Output examples
- **Validation Criteria:** What to verify
- **Common Patterns:** Correct vs incorrect (generic)
- **Test Coverage:** Required test scenarios

Critical requirements (must be included as rules):

Part 1 (File Format): Header format, UTF-8, BOM handling, blank line after header
Part 2 (Timestamps): Format [HH:]MM:SS.mmm, ranges, start<=end, sequential
Part 3 (Cue Structure): Identifier restrictions, --> separator, blank line terminator
Part 4 (Cue Settings): vertical, line, position, size, align, region (6 settings)
Part 5 (Tags): c, i, b, u, v, lang, ruby, timestamp (8 tags), closing rules, escaping
Part 6 (Regions): REGION block, id/width/lines/regionanchor/viewportanchor/scroll
Part 7 (Special Blocks): NOTE (comments), STYLE (CSS)
Part 8 (Implementation): Generic IMPL-* rules for Parser/Writer/Validator
Part 9 (Validation Summary): Rule counts, self-validation report
Part 10 (Quick Reference): Tables for settings and tags

Target Rule Counts (Exhaustive):

RULE-FMT-###: 5-7 file format rules (header, encoding, BOM, line endings, blank line)
RULE-TIME-###: 7-10 timestamp rules (format, validation, ranges, separator, sequential)
RULE-CUE-###: 5-8 cue structure rules (identifier, timing line, payload, blank line)
RULE-SET-###: 8 cue setting rules (vertical, line, position, size, align, region, + constraints)
RULE-TAG-###: 11-15 tag/markup rules (all 8 tags + closing rules + nesting + escaping)
RULE-ENT-###: 3-5 HTML entity rules (&, <, >, , ‎, ‏)
RULE-REG-###: 5-8 region rules (REGION block, all properties, association)
RULE-BLK-###: 3-5 special block rules (NOTE, STYLE, metadata)
RULE-VAL-###: 5-8 validation rules (error handling, recovery, strict vs. lenient)
IMPL-###: 12-15 implementation requirements (parser, writer, validator)
Total: 60-80 rules (comprehensive coverage)

Level Distribution (Exhaustive):

MUST: 30-40 rules (critical requirements)
SHOULD: 15-20 rules (recommended practices)
MAY: 5-10 rules (optional features)
MUST NOT: 3-5 rules (forbidden patterns)

Critical Inclusions (MUST be documented):

All 8 Markup Tags (Individual Rules):

<c> / <c.class> - Class spans (RULE-TAG-001)
 - Italics (RULE-TAG-002)
 - Bold (RULE-TAG-003)
 - Underline (RULE-TAG-004)
<v> - Voice/speaker (RULE-TAG-005)
<lang> - Language (RULE-TAG-006)
<ruby><rt> - Ruby text (RULE-TAG-007)
<HH:MM:SS.mmm> - Internal timestamp (RULE-TAG-008)

All 6 Cue Settings (Individual Rules):

vertical: rl | lr (RULE-SET-001)
line: N | N% (RULE-SET-002)
position: N% (RULE-SET-003)
size: N% (RULE-SET-004)
align: start|center|end|left|right (RULE-SET-005)
region: id (RULE-SET-006)

All Required HTML Entities (Individual Rules):

& (ampersand) - RULE-ENT-001
< (less than) - RULE-ENT-002
> (greater than) - RULE-ENT-003
(non-breaking space) - RULE-ENT-004
‎ (left-to-right mark) - RULE-ENT-005
‏ (right-to-left mark) - RULE-ENT-006

REGION Properties (Individual Rules):

id (required) - RULE-REG-001
width (percentage) - RULE-REG-002
lines (integer) - RULE-REG-003
regionanchor (percentage pair) - RULE-REG-004
viewportanchor (percentage pair) - RULE-REG-005
scroll (up/none) - RULE-REG-006

Generate spec with incremental writing (context-efficient):

from datetime import datetime
import os

os.makedirs("ai_artifacts/specs/vtt", exist_ok=True)
spec_path = "ai_artifacts/specs/vtt/vtt_specs_summary.md"

# Write spec header
spec_content = f"""# WebVTT Specification - Complete Reference

**Generated**: {datetime.now().strftime("%Y-%m-%d")}
**Sources**: W3C WebVTT Specification (https://www.w3.org/TR/webvtt1/), MDN Web Docs
**Version**: W3C Candidate Recommendation
**Total Rules**: [TO BE CALCULATED]

---

"""

with open(spec_path, "w") as _f:
    _f.write(spec_content)

# Then generate and append each part section by section:
# Part 1: File Format rules
# Part 2: Timestamp rules
# ... continue for all parts (Parts 1-10)
# Append each part with: with open(spec_path, "a") as _f: _f.write(part)

Step 5: Exhaustive Quality Validation

Structure checks:

All rule IDs unique
Sequential numbering within each category
Valid test patterns
Level indicators present (MUST/SHOULD/MAY/MUST NOT)

Content checks (Exhaustive - 100% required):

✅ 60-80 total rules documented (RULE-* + IMPL-*)
✅ 30-40 MUST rules (all critical requirements)
✅ 15-20 SHOULD rules (best practices)
✅ 5-10 MAY rules (optional features)
✅ 12-15 IMPL-* rules (generic, no pycaption references)
✅ All 8 markup tags individually documented (c, i, b, u, v, lang, ruby, timestamp)
✅ All 6 cue settings individually documented (vertical, line, position, size, align, region)
✅ All 6 HTML entities individually documented (&, <, >, , ‎, ‏)
✅ All 6 REGION properties individually documented (id, width, lines, regionanchor, viewportanchor, scroll)
✅ STYLE block specification complete
✅ NOTE block specification complete
✅ Timestamp validation rules complete (format, ranges, start<=end, sequential)
✅ Validation rules complete (error handling, recovery strategies)
✅ Best practices documented (interoperability, browser compatibility)

Generate exhaustive validation report in spec file:

## Part 10: Exhaustive Validation Summary

### Rule Counts by Category
- RULE-FMT-###: X file format rules (Target: 5-7)
- RULE-TIME-###: X timestamp rules (Target: 7-10)
- RULE-CUE-###: X cue structure rules (Target: 5-8)
- RULE-SET-###: X cue setting rules (Target: 8 - ALL settings)
- RULE-TAG-###: X tag/markup rules (Target: 11-15 - ALL 8 tags + rules)
- RULE-ENT-###: X HTML entity rules (Target: 3-5 - ALL 6 entities)
- RULE-REG-###: X region rules (Target: 5-8 - ALL 6 properties)
- RULE-BLK-###: X special block rules (Target: 3-5)
- RULE-VAL-###: X validation rules (Target: 5-8)
- IMPL-###: X implementation requirements (Target: 12-15)
- **Total: Y rules** (Target: 60-80 for exhaustive coverage)

### By Level (Exhaustive Distribution)
- MUST: X rules (Target: 30-40)
- SHOULD: X rules (Target: 15-20)
- MAY: X rules (Target: 5-10)
- MUST NOT: X rules (Target: 3-5)

### Coverage Verification (100% Required)

**Markup Tags (8 total - ALL must be documented):**
- ✅/❌ `<c>` class spans (RULE-TAG-001)
- ✅/❌ `<i>` italics (RULE-TAG-002)
- ✅/❌ `<b>` bold (RULE-TAG-003)
- ✅/❌ `<u>` underline (RULE-TAG-004)
- ✅/❌ `<v>` voice (RULE-TAG-005)
- ✅/❌ `<lang>` language (RULE-TAG-006)
- ✅/❌ `<ruby><rt>` ruby text (RULE-TAG-007)
- ✅/❌ `<HH:MM:SS.mmm>` timestamp (RULE-TAG-008)
**Status: X/8 tags documented**

**Cue Settings (6 total - ALL must be documented):**
- ✅/❌ vertical: rl|lr (RULE-SET-001)
- ✅/❌ line: N|N% (RULE-SET-002)
- ✅/❌ position: N% (RULE-SET-003)
- ✅/❌ size: N% (RULE-SET-004)
- ✅/❌ align: start|center|end|left|right (RULE-SET-005)
- ✅/❌ region: id (RULE-SET-006)
**Status: X/6 settings documented**

**HTML Entities (6 required - ALL must be documented):**
- ✅/❌ &amp; ampersand (RULE-ENT-001)
- ✅/❌ &lt; less than (RULE-ENT-002)
- ✅/❌ &gt; greater than (RULE-ENT-003)
- ✅/❌ &nbsp; non-breaking space (RULE-ENT-004)
- ✅/❌ &lrm; left-to-right mark (RULE-ENT-005)
- ✅/❌ &rlm; right-to-left mark (RULE-ENT-006)
**Status: X/6 entities documented**

**REGION Properties (6 total - ALL must be documented):**
- ✅/❌ id (required) (RULE-REG-001)
- ✅/❌ width: N% (RULE-REG-002)
- ✅/❌ lines: N (RULE-REG-003)
- ✅/❌ regionanchor: X%,Y% (RULE-REG-004)
- ✅/❌ viewportanchor: X%,Y% (RULE-REG-005)
- ✅/❌ scroll: up|none (RULE-REG-006)
**Status: X/6 properties documented**

### Self-Validation Checklist
- ✅/❌ All rule IDs unique
- ✅/❌ Sequential numbering within categories
- ✅/❌ All 8 markup tags individually documented
- ✅/❌ All 6 cue settings individually documented
- ✅/❌ All 6 HTML entities individually documented
- ✅/❌ All 6 REGION properties individually documented
- ✅/❌ Generic IMPL rules (no pycaption-specific code)
- ✅/❌ Test patterns present for all rules
- ✅/❌ Source attribution present
- ✅/❌ 60-80 total rules (exhaustive coverage target)
- ✅/❌ 30-40 MUST rules documented

### Overall Status
- **Completeness**: X% (100% required)
- **Status**: ✅ PASS | ❌ FAIL (requires fixes)

**If FAIL**: Missing items listed above must be added before spec is complete.

If validation FAILS:

Identify missing rules/categories
Search additional sources for missing details
Add missing rules
Re-validate until PASS

Step 6: Source Attribution

Track sources for each rule:

W3C WebVTT spec section (Primary)
MDN docs (Confirms)
Confidence: High/Medium/Low

Document conflicts and resolutions.

Step 7: Update Web Sources

Append new URLs (if any) to ai_artifacts/specs/vtt/vtt_web_sources.md:

- [New Source Title](https://url.example.com)

Step 8: Post-Generation Validation Against Master Checklist

CRITICAL: After generating the spec, run this validation script. If it reports FAIL, fix the spec and re-run until PASS.

import re

print("=" * 60)
print("POST-GENERATION VALIDATION: WebVTT")
print("Checking vtt_specs_summary.md against master_checklist.md")
print("=" * 60)

with open('ai_artifacts/specs/vtt/master_checklist.md') as _f:
    checklist = _f.read()
with open('ai_artifacts/specs/vtt/vtt_specs_summary.md') as _f:
    spec = _f.read()

failures = []
warnings = []

# 1. Check all required rule IDs
rule_ids = re.findall(r'^- ((?:RULE|IMPL)-[A-Z]+-\d{3})', checklist, re.M)
for rid in rule_ids:
    if rid not in spec:
        failures.append(f"MISSING RULE: {rid}")
found_rules = len(rule_ids) - len([f for f in failures if 'MISSING RULE' in f])
print(f"[1/6] Rule IDs: {found_rules}/{len(rule_ids)}")

# 2. Check required tags
tags_section = re.search(r'## Required Tags.*?\n((?:- .+\n)+)', checklist)
if tags_section:
    tags = re.findall(r'^- `(.+?)`', tags_section.group(1), re.M)
    for tag in tags:
        # Search for the tag in spec (handle angle brackets)
        tag_clean = tag.replace('<', '').replace('>', '').split('/')[0].split('.')[0]
        if not re.search(rf'<{re.escape(tag_clean)}[>\s./]', spec):
            if not re.search(re.escape(tag_clean), spec, re.I):
                failures.append(f"MISSING TAG: {tag}")
    print(f"[2/6] Tags: {len(tags) - len([f for f in failures if 'TAG' in f])}/{len(tags)}")

# 3. Check required settings
settings_section = re.search(r'## Required Cue Settings.*?\n((?:- .+\n)+)', checklist)
if settings_section:
    settings = re.findall(r'^- (\w+):', settings_section.group(1), re.M)
    for setting in settings:
        if not re.search(rf'\b{re.escape(setting)}\b', spec):
            failures.append(f"MISSING SETTING: {setting}")
    print(f"[3/6] Settings: {len(settings) - len([f for f in failures if 'SETTING' in f])}/{len(settings)}")

# 4. Check required entities
entities_section = re.search(r'## Required HTML Entities.*?\n((?:- .+\n)+)', checklist)
if entities_section:
    entities = re.findall(r'^- (.+?)$', entities_section.group(1), re.M)
    for entity in entities:
        entity_clean = entity.strip().split(' ')[0]
        if entity_clean not in spec:
            if not re.search(re.escape(entity_clean), spec):
                warnings.append(f"MISSING ENTITY: {entity_clean}")
    print(f"[4/6] Entities: checked {len(entities)}")

# 5. Check required enum values
enum_sections = re.findall(r'### (.+?)\n((?:- .+\n)+)', checklist)
missing_enums = 0
total_enums = 0
for section_name, values_block in enum_sections:
    values = re.findall(r'^- (.+)$', values_block, re.M)
    for val in values:
        val_clean = val.strip()
        total_enums += 1
        if val_clean not in spec:
            if not re.search(re.escape(val_clean), spec, re.I):
                missing_enums += 1
                warnings.append(f"MISSING ENUM [{section_name}]: {val_clean}")
print(f"[5/6] Enum values: {total_enums - missing_enums}/{total_enums}")

# 6. Check severity distribution
severity_section = re.search(r'## Required Severity Distribution\n((?:.*\n)*)', checklist)
if severity_section:
    for match in re.finditer(r'- (MUST|SHOULD|MAY|MUST NOT): (\d+)', severity_section.group(1)):
        level, minimum = match.group(1), int(match.group(2))
        actual = len(re.findall(rf'Level:\*\*\s*{re.escape(level)}\b', spec))
        if actual < minimum:
            failures.append(f"SEVERITY {level}: found {actual}, need >= {minimum}")
        print(f"[6/6] {level}: {actual} (min {minimum}) {'PASS' if actual >= minimum else 'FAIL'}")

# Report
print("\n" + "=" * 60)
if failures:
    print(f"FAIL: {len(failures)} failures, {len(warnings)} warnings\n")
    for f in failures:
        print(f"  FAIL: {f}")
    for w in warnings[:10]:
        print(f"  WARN: {w}")
    if len(warnings) > 10:
        print(f"  ... and {len(warnings) - 10} more warnings")
    print("\nFix the spec and re-run this validation.")
else:
    print(f"PASS: All checks passed ({len(warnings)} warnings)")
    for w in warnings[:10]:
        print(f"  WARN: {w}")
print("=" * 60)

If FAIL: Fix the missing items in the spec, then re-run the validation script. Repeat until PASS.

Output Files

ai_artifacts/specs/vtt/vtt_specs_summary.md - Complete specification with 60-80 rules
ai_artifacts/specs/vtt/vtt_web_sources.md - Updated URL list (if new sources found)

Success Criteria (Exhaustive - 100% Required)

Master Checklist Validation (CRITICAL - must PASS):

All rule IDs from master_checklist.md present in generated spec
All 8 tags present
All 6 settings present
All 6 entities present
All enum values present
Severity distribution meets minimums

Completeness:

60-80 total rules documented (RULE-* + IMPL-*)
All 8 markup tags individually documented with examples
All 6 cue settings individually documented with validation
All 6 HTML entities individually documented
All 6 REGION properties individually documented
Header, timestamp, cue structure, special blocks rules
12-15 IMPL rules (generic, no pycaption-specific code)

Quality:

Unique rule IDs (no duplicates)
Sequential numbering within categories
Valid test patterns for all rules
Source attribution (W3C section references)
Generic IMPL rules (no pycaption-specific references)

Web Sources:

W3C WebVTT spec fetched
MDN documentation fetched
All new sources added to vtt_web_sources.md

Context Window Optimization

Token usage target: < 50K per invocation

Strategies:

Targeted web fetch - Extract text only, not full HTML
Incremental writing - Save spec file as rules are generated, not at end
On-demand web search - Only if completeness check finds gaps
Section-by-section - Process file format → timestamps → cues → tags → etc.
Rule metadata first - Extract rule IDs/levels, fetch details on-demand

Estimated token usage:

Web source fetches: 10-15K tokens
Rule generation (40-50 rules): 15-20K tokens
Validation & tables: 5K tokens
Total: ~35K tokens (30% safety margin)

Error Handling

vtt_web_sources.md not found: Create it with W3C spec URL
No URLs in file: Proceed with web search
Web fetch fails: Continue with available sources + web search
Web search fails: Use built-in W3C WebVTT knowledge
Cannot write output: Report error with path

name	analyze-vtt-docs
description	Generates EXHAUSTIVE WebVTT specification summary from web sources with complete rule coverage, all tags/settings/entities, and self-validation.

analyze-vtt-docs

What this skill does

Generates comprehensive, exhaustive WebVTT specification (vtt_specs_summary.md) as single source of truth for compliance checking.

Outputs:

50+ RULE-XXX specifications with unique IDs and test patterns
12+ IMPL-XXX requirements (generic, no pycaption references)
All 8 markup tags individually documented (c, i, b, u, v, lang, ruby, timestamp)
All 8 cue settings individually documented (vertical, line, position, size, align, region, etc.)
All required HTML entities (&, <, >, , ‎, ‏)
Region specifications complete (REGION block properties)
STYLE/NOTE blocks documented
Self-validation report (rule counts, completeness check)
Source attribution per rule

Key: Ensures NO requirements missed - exhaustive coverage from W3C spec + MDN + web search.

Pre-flight: Read .claude/skills/gotchas.md before generating specs. Pay special attention to gotcha #3 (W3C license attribution required).

Usage:

/analyze-vtt-docs

Single command - fetches web sources, performs comprehensive analysis, generates complete spec.

Implementation

Step 0: Check Existing Sources

Read existing documentation:

# Check what we already have
ls -la ai_artifacts/specs/vtt/
cat ai_artifacts/specs/vtt/vtt_web_sources.md

If vtt_specs_summary.md exists:

Read it to assess completeness
Identify gaps using completeness checklist (Step 2)
Only fetch new sources if gaps exist

Step 1: Fetch Known Web Sources (WebFetch Tool Required)

IMPORTANT: This step requires the WebFetch tool to be loaded first.

Check if WebFetch is available, load if needed:

# WebFetch is a deferred tool - load it before use
# Use ToolSearch to load WebFetch

Read URLs from ai_artifacts/specs/vtt/vtt_web_sources.md:

import re

with open("ai_artifacts/specs/vtt/vtt_web_sources.md") as _f:
    sources_content = _f.read()

# Extract URLs from markdown links: [Text](URL)
url_pattern = r'\[([^\]]+)\]\(([^)]+)\)'
existing_sources = []

for match in re.findall(url_pattern, sources_content):
    title, url = match
    existing_sources.append({'title': title, 'url': url})

print(f"Found {len(existing_sources)} existing sources")
for s in existing_sources:
    print(f"   - {s['title']}")

Fetch W3C WebVTT Specification (Primary Source):

# Fetch W3C spec - most authoritative source
w3c_url = 'https://www.w3.org/TR/webvtt1/'
print("Fetching W3C WebVTT Specification...")

# Use the WebFetch tool to fetch w3c_url
# Store result in a variable for processing
# w3c_content = <result from WebFetch tool>

Fetch MDN Documentation (Supplementary):

# MDN provides practical examples and browser compatibility info
mdn_url = 'https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API'
print("Fetching MDN WebVTT Documentation...")

# Use the WebFetch tool to fetch mdn_url
# mdn_content = <result from WebFetch tool>

Context optimization:

Fetch sources sequentially, not in parallel (avoid context overflow)
Extract text content only, discard HTML tags
Focus on specification sections
Save to temp files, don't hold in memory

Step 2: Comprehensive Web Search for Missing Details

Perform targeted web searches to fill gaps:

# Define search queries for comprehensive coverage
search_queries = [
    "WebVTT specification complete W3C",
    "WebVTT cue settings all options",
    "WebVTT markup tags complete list",
    "WebVTT HTML entities supported",
    "WebVTT REGION block specification",
    "WebVTT STYLE block CSS",
    "WebVTT NOTE comment syntax",
    "WebVTT timestamp format validation",
    "WebVTT best practices implementation",
    "WebVTT validation rules MUST SHOULD",
]

# Execute searches and collect results
search_results = []
for query in search_queries:
    print(f"Searching: {query}")
    # Use the WebSearch tool for each query
    results = []  # populated by WebSearch tool
    search_results.append({
        'query': query,
        'results': results
    })
    # Brief delay to avoid rate limiting

Identify high-value sources from search results:

import re

# Re-read existing sources (each block is independent)
with open("ai_artifacts/specs/vtt/vtt_web_sources.md") as _f:
    _sources_content = _f.read()
existing_sources = [
    {'title': m[0], 'url': m[1]}
    for m in re.findall(r'\[([^\]]+)\]\(([^)]+)\)', _sources_content)
]

# Agent: for each URL found in the search step above, check if it is
# authoritative (w3.org, developer.mozilla.org, github.com/w3c) and not
# already in existing_sources. Collect matches into new_sources list:
_existing_urls = {s['url'] for s in existing_sources}
new_sources = []  # Agent fills this from search results
# new_sources.append({'title': <title>, 'url': <url>, 'query': <query>})

print(f"\nFound {len(new_sources)} new authoritative sources")

Fetch new sources:

# Agent: for each source in new_sources (up to 5), use WebFetch to
# retrieve the content. new_sources was built in the filtering step above.
# for source in new_sources[:5]:
#     print(f"Fetching: {source['title']}")
#     # Use the WebFetch tool with url=source['url']

Step 3: Exhaustive Completeness Verification

CRITICAL: Verify ALL these areas covered in fetched content (100% coverage required):

File Format:

Header: "WEBVTT" exact match (case-sensitive), optional space + comment
UTF-8 encoding requirement (MUST)
Optional UTF-8 BOM handling
Line endings: CR, LF, CRLF all valid
Blank line after header before first cue

Timestamp Format:

Format: [HH:]MM:SS.mmm (hours optional if < 1 hour)
Milliseconds required (3 digits)
Separator: --> (spaces required)
Start time <= end time (MUST)
Sequential ordering (SHOULD)
Valid ranges: HH (00-99), MM (00-59), SS (00-59), mmm (000-999)

Cue Structure:

Optional cue identifier (any text except "-->", "NOTE", or looks like timestamp)
Required: start --> end [optional settings]
Cue payload (can span multiple lines)
Blank line terminates cue

Cue Settings:

vertical: rl, lr (text direction)
line: N or N% (vertical position, can be negative)
position: N% (horizontal position 0-100)
size: N% (cue box width 0-100)
align: start, center, end, left, right
region: region_id (reference to defined region)

Tags (Markup):

Class spans: <c.classname>text</c> (multiple classes: <c.class1.class2>)
Italics: text
Bold: text
Underline: text
Ruby: <ruby>base<rt>annotation</rt></ruby>
Voice: <v Speaker>text</v> (optional annotation)
Language: <lang code>text</lang>
Internal timestamps: <00:01:23.456> (karaoke-style)
Tag nesting rules and restrictions
Escape sequences: & < > ‎ ‏

Regions (Optional Feature):

REGION block definition before cues
Properties: id, width, lines, regionanchor, viewportanchor, scroll
Association with cues via region:id setting

Special Blocks:

NOTE blocks (comments, ignored by parser)
STYLE blocks (CSS for cue pseudo-elements)
Syntax and placement rules

Validation Requirements:

All MUST requirements from W3C spec
All SHOULD requirements
All MAY optional features
All MUST NOT forbidden patterns
Error handling strategies

Edge Cases & Common Pitfalls:

Extra text on first line after "WEBVTT"
Missing milliseconds in timestamps
Missing spaces around -->
Invalid cue settings
Unclosed tags
Un-escaped special characters
Percentage out of range (0-100)
Start > end time
Invalid UTF-8 sequences

Implementation Requirements:

Parser requirements (UTF-8 decoder, timestamp parser, tag parser, settings parser)
Writer requirements (UTF-8 encoder, escaping, formatting)
Error handling strategies
Performance considerations

Browser Compatibility:

Feature support across browsers
Cue settings support
Region support (limited)
STYLE block support (varies)
Graceful degradation

Completeness Checklist (MUST achieve 100%):

# TEMPLATE: All values start as False. Update each to True as you confirm
# coverage during spec generation. Re-run this block to check progress.
completeness_check = {
    'file_format': {
        'header': False,  # WEBVTT signature
        'encoding': False,  # UTF-8
        'bom': False,  # BOM handling
        'line_endings': False,  # CR/LF/CRLF
        'blank_line': False,  # After header
    },
    'timestamps': {
        'format': False,  # [HH:]MM:SS.mmm
        'validation': False,  # Start <= end
        'ranges': False,  # MM/SS 00-59
        'milliseconds': False,  # Exactly 3 digits
        'separator': False,  # ` --> `
    },
    'cue_settings': {
        'vertical': False,  # rl/lr
        'line': False,  # N or N%
        'position': False,  # N%
        'size': False,  # N%
        'align': False,  # start/center/end/left/right
        'region': False,  # region_id
    },
    'markup_tags': {
        'class_span': False,  # <c>
        'italics': False,  # <i>
        'bold': False,  # <b>
        'underline': False,  # <u>
        'voice': False,  # <v>
        'language': False,  # <lang>
        'ruby': False,  # <ruby><rt>
        'timestamp': False,  # <00:01:23.456>
    },
    'html_entities': {
        'required': False,  # &amp; &lt; &gt; &nbsp; &lrm; &rlm;
        'escaping': False,  # Escape rules
    },
    'regions': {
        'region_block': False,  # REGION definition
        'properties': False,  # id/width/lines/anchors/scroll
    },
    'special_blocks': {
        'note': False,  # NOTE comments
        'style': False,  # STYLE CSS
    },
    'validation': {
        'must_rules': False,  # All MUST requirements
        'should_rules': False,  # All SHOULD requirements
        'error_handling': False,  # Error strategies
    },
}

# Calculate completeness percentage
total_items = sum(len(v) for v in completeness_check.values())
covered_items = sum(sum(v.values()) for v in completeness_check.values())
completeness = (covered_items / total_items) * 100

print(f"Completeness: {completeness:.1f}% ({covered_items}/{total_items} items)")

if completeness < 100:
    print("Missing items - additional web search required")
    # List what's missing
    for category, items in completeness_check.items():
        missing = [k for k, v in items.items() if not v]
        if missing:
            print(f"   {category}: {', '.join(missing)}")

If new sources found during search, update vtt_web_sources.md:

# Agent: if you discovered new sources during the search/filter steps,
# append them to vtt_web_sources.md now. For each new source URL not
# already in the file, add a markdown link line.
import re as _re, os
_sources_path = "ai_artifacts/specs/vtt/vtt_web_sources.md"
if os.path.exists(_sources_path):
    with open(_sources_path) as _f:
        _current = _f.read()
    _known_urls = {m[1] for m in _re.findall(r'\[([^\]]+)\]\(([^)]+)\)', _current)}
    # Agent: for each new source discovered above, if url not in _known_urls:
    #   _current += f"- [{title}]({url})\n"
    # Then write back:
    # with open(_sources_path, "w") as _f:
    #     _f.write(_current)
    print("Source file update complete")
else:
    print(f"WARNING: {_sources_path} not found — skipping source update")

Step 4: Generate Exhaustive Specification

Create ai_artifacts/specs/vtt/vtt_specs_summary.md using the rule format below.

Key differences from old approach:

Rule-based format with unique IDs (RULE-FMT-###, RULE-TIME-###, etc.)
Generic IMPL-### rules (no pycaption-specific code references)
Test patterns for automated validation
Level indicators (MUST/SHOULD/MAY/MUST NOT)
Source attribution per rule

Rule Format:

**[RULE-XXX-###]** Brief requirement
- **Requirement:** What must be true
- **Level:** MUST | SHOULD | MAY | MUST NOT
- **Validation:** How to check
- **Test Pattern:** Regex or algorithm
- **Sources:** [Attribution]

Implementation Rule Format (GENERIC):

**[IMPL-XXX-###]** Component MUST do X
- **Spec Rule:** RULE-XXX-###
- **Component:** Parser | Writer | Validator
- **Implementation Requirement:** What ANY compliant implementation must do
- **Expected Behavior:** Input → Output examples
- **Validation Criteria:** What to verify
- **Common Patterns:** Correct vs incorrect (generic)
- **Test Coverage:** Required test scenarios

Critical requirements (must be included as rules):

Target Rule Counts (Exhaustive):

RULE-FMT-###: 5-7 file format rules (header, encoding, BOM, line endings, blank line)
RULE-TIME-###: 7-10 timestamp rules (format, validation, ranges, separator, sequential)
RULE-CUE-###: 5-8 cue structure rules (identifier, timing line, payload, blank line)
RULE-SET-###: 8 cue setting rules (vertical, line, position, size, align, region, + constraints)
RULE-TAG-###: 11-15 tag/markup rules (all 8 tags + closing rules + nesting + escaping)
RULE-ENT-###: 3-5 HTML entity rules (&, <, >, , ‎, ‏)
RULE-REG-###: 5-8 region rules (REGION block, all properties, association)
RULE-BLK-###: 3-5 special block rules (NOTE, STYLE, metadata)
RULE-VAL-###: 5-8 validation rules (error handling, recovery, strict vs. lenient)
IMPL-###: 12-15 implementation requirements (parser, writer, validator)
Total: 60-80 rules (comprehensive coverage)

Level Distribution (Exhaustive):

MUST: 30-40 rules (critical requirements)
SHOULD: 15-20 rules (recommended practices)
MAY: 5-10 rules (optional features)
MUST NOT: 3-5 rules (forbidden patterns)

Critical Inclusions (MUST be documented):

All 8 Markup Tags (Individual Rules):

<c> / <c.class> - Class spans (RULE-TAG-001)
 - Italics (RULE-TAG-002)
 - Bold (RULE-TAG-003)
 - Underline (RULE-TAG-004)
<v> - Voice/speaker (RULE-TAG-005)
<lang> - Language (RULE-TAG-006)
<ruby><rt> - Ruby text (RULE-TAG-007)
<HH:MM:SS.mmm> - Internal timestamp (RULE-TAG-008)

All 6 Cue Settings (Individual Rules):

vertical: rl | lr (RULE-SET-001)
line: N | N% (RULE-SET-002)
position: N% (RULE-SET-003)
size: N% (RULE-SET-004)
align: start|center|end|left|right (RULE-SET-005)
region: id (RULE-SET-006)

All Required HTML Entities (Individual Rules):

& (ampersand) - RULE-ENT-001
< (less than) - RULE-ENT-002
> (greater than) - RULE-ENT-003
(non-breaking space) - RULE-ENT-004
‎ (left-to-right mark) - RULE-ENT-005
‏ (right-to-left mark) - RULE-ENT-006

REGION Properties (Individual Rules):

id (required) - RULE-REG-001
width (percentage) - RULE-REG-002
lines (integer) - RULE-REG-003
regionanchor (percentage pair) - RULE-REG-004
viewportanchor (percentage pair) - RULE-REG-005
scroll (up/none) - RULE-REG-006

Generate spec with incremental writing (context-efficient):

from datetime import datetime
import os

os.makedirs("ai_artifacts/specs/vtt", exist_ok=True)
spec_path = "ai_artifacts/specs/vtt/vtt_specs_summary.md"

# Write spec header
spec_content = f"""# WebVTT Specification - Complete Reference

**Generated**: {datetime.now().strftime("%Y-%m-%d")}
**Sources**: W3C WebVTT Specification (https://www.w3.org/TR/webvtt1/), MDN Web Docs
**Version**: W3C Candidate Recommendation
**Total Rules**: [TO BE CALCULATED]

---

"""

with open(spec_path, "w") as _f:
    _f.write(spec_content)

# Then generate and append each part section by section:
# Part 1: File Format rules
# Part 2: Timestamp rules
# ... continue for all parts (Parts 1-10)
# Append each part with: with open(spec_path, "a") as _f: _f.write(part)

Step 5: Exhaustive Quality Validation

Structure checks:

All rule IDs unique
Sequential numbering within each category
Valid test patterns
Level indicators present (MUST/SHOULD/MAY/MUST NOT)

Content checks (Exhaustive - 100% required):

✅ 60-80 total rules documented (RULE-* + IMPL-*)
✅ 30-40 MUST rules (all critical requirements)
✅ 15-20 SHOULD rules (best practices)
✅ 5-10 MAY rules (optional features)
✅ 12-15 IMPL-* rules (generic, no pycaption references)
✅ All 8 markup tags individually documented (c, i, b, u, v, lang, ruby, timestamp)
✅ All 6 cue settings individually documented (vertical, line, position, size, align, region)
✅ All 6 HTML entities individually documented (&, <, >, , ‎, ‏)
✅ All 6 REGION properties individually documented (id, width, lines, regionanchor, viewportanchor, scroll)
✅ STYLE block specification complete
✅ NOTE block specification complete
✅ Timestamp validation rules complete (format, ranges, start<=end, sequential)
✅ Validation rules complete (error handling, recovery strategies)
✅ Best practices documented (interoperability, browser compatibility)

Generate exhaustive validation report in spec file:

## Part 10: Exhaustive Validation Summary

### Rule Counts by Category
- RULE-FMT-###: X file format rules (Target: 5-7)
- RULE-TIME-###: X timestamp rules (Target: 7-10)
- RULE-CUE-###: X cue structure rules (Target: 5-8)
- RULE-SET-###: X cue setting rules (Target: 8 - ALL settings)
- RULE-TAG-###: X tag/markup rules (Target: 11-15 - ALL 8 tags + rules)
- RULE-ENT-###: X HTML entity rules (Target: 3-5 - ALL 6 entities)
- RULE-REG-###: X region rules (Target: 5-8 - ALL 6 properties)
- RULE-BLK-###: X special block rules (Target: 3-5)
- RULE-VAL-###: X validation rules (Target: 5-8)
- IMPL-###: X implementation requirements (Target: 12-15)
- **Total: Y rules** (Target: 60-80 for exhaustive coverage)

### By Level (Exhaustive Distribution)
- MUST: X rules (Target: 30-40)
- SHOULD: X rules (Target: 15-20)
- MAY: X rules (Target: 5-10)
- MUST NOT: X rules (Target: 3-5)

### Coverage Verification (100% Required)

**Markup Tags (8 total - ALL must be documented):**
- ✅/❌ `<c>` class spans (RULE-TAG-001)
- ✅/❌ `<i>` italics (RULE-TAG-002)
- ✅/❌ `<b>` bold (RULE-TAG-003)
- ✅/❌ `<u>` underline (RULE-TAG-004)
- ✅/❌ `<v>` voice (RULE-TAG-005)
- ✅/❌ `<lang>` language (RULE-TAG-006)
- ✅/❌ `<ruby><rt>` ruby text (RULE-TAG-007)
- ✅/❌ `<HH:MM:SS.mmm>` timestamp (RULE-TAG-008)
**Status: X/8 tags documented**

**Cue Settings (6 total - ALL must be documented):**
- ✅/❌ vertical: rl|lr (RULE-SET-001)
- ✅/❌ line: N|N% (RULE-SET-002)
- ✅/❌ position: N% (RULE-SET-003)
- ✅/❌ size: N% (RULE-SET-004)
- ✅/❌ align: start|center|end|left|right (RULE-SET-005)
- ✅/❌ region: id (RULE-SET-006)
**Status: X/6 settings documented**

**HTML Entities (6 required - ALL must be documented):**
- ✅/❌ &amp; ampersand (RULE-ENT-001)
- ✅/❌ &lt; less than (RULE-ENT-002)
- ✅/❌ &gt; greater than (RULE-ENT-003)
- ✅/❌ &nbsp; non-breaking space (RULE-ENT-004)
- ✅/❌ &lrm; left-to-right mark (RULE-ENT-005)
- ✅/❌ &rlm; right-to-left mark (RULE-ENT-006)
**Status: X/6 entities documented**

**REGION Properties (6 total - ALL must be documented):**
- ✅/❌ id (required) (RULE-REG-001)
- ✅/❌ width: N% (RULE-REG-002)
- ✅/❌ lines: N (RULE-REG-003)
- ✅/❌ regionanchor: X%,Y% (RULE-REG-004)
- ✅/❌ viewportanchor: X%,Y% (RULE-REG-005)
- ✅/❌ scroll: up|none (RULE-REG-006)
**Status: X/6 properties documented**

### Self-Validation Checklist
- ✅/❌ All rule IDs unique
- ✅/❌ Sequential numbering within categories
- ✅/❌ All 8 markup tags individually documented
- ✅/❌ All 6 cue settings individually documented
- ✅/❌ All 6 HTML entities individually documented
- ✅/❌ All 6 REGION properties individually documented
- ✅/❌ Generic IMPL rules (no pycaption-specific code)
- ✅/❌ Test patterns present for all rules
- ✅/❌ Source attribution present
- ✅/❌ 60-80 total rules (exhaustive coverage target)
- ✅/❌ 30-40 MUST rules documented

### Overall Status
- **Completeness**: X% (100% required)
- **Status**: ✅ PASS | ❌ FAIL (requires fixes)

**If FAIL**: Missing items listed above must be added before spec is complete.

If validation FAILS:

Identify missing rules/categories
Search additional sources for missing details
Add missing rules
Re-validate until PASS

Step 6: Source Attribution

Track sources for each rule:

W3C WebVTT spec section (Primary)
MDN docs (Confirms)
Confidence: High/Medium/Low

Document conflicts and resolutions.

Step 7: Update Web Sources

Append new URLs (if any) to ai_artifacts/specs/vtt/vtt_web_sources.md:

- [New Source Title](https://url.example.com)

Step 8: Post-Generation Validation Against Master Checklist

CRITICAL: After generating the spec, run this validation script. If it reports FAIL, fix the spec and re-run until PASS.

import re

print("=" * 60)
print("POST-GENERATION VALIDATION: WebVTT")
print("Checking vtt_specs_summary.md against master_checklist.md")
print("=" * 60)

with open('ai_artifacts/specs/vtt/master_checklist.md') as _f:
    checklist = _f.read()
with open('ai_artifacts/specs/vtt/vtt_specs_summary.md') as _f:
    spec = _f.read()

failures = []
warnings = []

# 1. Check all required rule IDs
rule_ids = re.findall(r'^- ((?:RULE|IMPL)-[A-Z]+-\d{3})', checklist, re.M)
for rid in rule_ids:
    if rid not in spec:
        failures.append(f"MISSING RULE: {rid}")
found_rules = len(rule_ids) - len([f for f in failures if 'MISSING RULE' in f])
print(f"[1/6] Rule IDs: {found_rules}/{len(rule_ids)}")

# 2. Check required tags
tags_section = re.search(r'## Required Tags.*?\n((?:- .+\n)+)', checklist)
if tags_section:
    tags = re.findall(r'^- `(.+?)`', tags_section.group(1), re.M)
    for tag in tags:
        # Search for the tag in spec (handle angle brackets)
        tag_clean = tag.replace('<', '').replace('>', '').split('/')[0].split('.')[0]
        if not re.search(rf'<{re.escape(tag_clean)}[>\s./]', spec):
            if not re.search(re.escape(tag_clean), spec, re.I):
                failures.append(f"MISSING TAG: {tag}")
    print(f"[2/6] Tags: {len(tags) - len([f for f in failures if 'TAG' in f])}/{len(tags)}")

# 3. Check required settings
settings_section = re.search(r'## Required Cue Settings.*?\n((?:- .+\n)+)', checklist)
if settings_section:
    settings = re.findall(r'^- (\w+):', settings_section.group(1), re.M)
    for setting in settings:
        if not re.search(rf'\b{re.escape(setting)}\b', spec):
            failures.append(f"MISSING SETTING: {setting}")
    print(f"[3/6] Settings: {len(settings) - len([f for f in failures if 'SETTING' in f])}/{len(settings)}")

# 4. Check required entities
entities_section = re.search(r'## Required HTML Entities.*?\n((?:- .+\n)+)', checklist)
if entities_section:
    entities = re.findall(r'^- (.+?)$', entities_section.group(1), re.M)
    for entity in entities:
        entity_clean = entity.strip().split(' ')[0]
        if entity_clean not in spec:
            if not re.search(re.escape(entity_clean), spec):
                warnings.append(f"MISSING ENTITY: {entity_clean}")
    print(f"[4/6] Entities: checked {len(entities)}")

# 5. Check required enum values
enum_sections = re.findall(r'### (.+?)\n((?:- .+\n)+)', checklist)
missing_enums = 0
total_enums = 0
for section_name, values_block in enum_sections:
    values = re.findall(r'^- (.+)$', values_block, re.M)
    for val in values:
        val_clean = val.strip()
        total_enums += 1
        if val_clean not in spec:
            if not re.search(re.escape(val_clean), spec, re.I):
                missing_enums += 1
                warnings.append(f"MISSING ENUM [{section_name}]: {val_clean}")
print(f"[5/6] Enum values: {total_enums - missing_enums}/{total_enums}")

# 6. Check severity distribution
severity_section = re.search(r'## Required Severity Distribution\n((?:.*\n)*)', checklist)
if severity_section:
    for match in re.finditer(r'- (MUST|SHOULD|MAY|MUST NOT): (\d+)', severity_section.group(1)):
        level, minimum = match.group(1), int(match.group(2))
        actual = len(re.findall(rf'Level:\*\*\s*{re.escape(level)}\b', spec))
        if actual < minimum:
            failures.append(f"SEVERITY {level}: found {actual}, need >= {minimum}")
        print(f"[6/6] {level}: {actual} (min {minimum}) {'PASS' if actual >= minimum else 'FAIL'}")

# Report
print("\n" + "=" * 60)
if failures:
    print(f"FAIL: {len(failures)} failures, {len(warnings)} warnings\n")
    for f in failures:
        print(f"  FAIL: {f}")
    for w in warnings[:10]:
        print(f"  WARN: {w}")
    if len(warnings) > 10:
        print(f"  ... and {len(warnings) - 10} more warnings")
    print("\nFix the spec and re-run this validation.")
else:
    print(f"PASS: All checks passed ({len(warnings)} warnings)")
    for w in warnings[:10]:
        print(f"  WARN: {w}")
print("=" * 60)

If FAIL: Fix the missing items in the spec, then re-run the validation script. Repeat until PASS.

Output Files

ai_artifacts/specs/vtt/vtt_specs_summary.md - Complete specification with 60-80 rules
ai_artifacts/specs/vtt/vtt_web_sources.md - Updated URL list (if new sources found)

Success Criteria (Exhaustive - 100% Required)

Master Checklist Validation (CRITICAL - must PASS):

All rule IDs from master_checklist.md present in generated spec
All 8 tags present
All 6 settings present
All 6 entities present
All enum values present
Severity distribution meets minimums

Completeness:

60-80 total rules documented (RULE-* + IMPL-*)
All 8 markup tags individually documented with examples
All 6 cue settings individually documented with validation
All 6 HTML entities individually documented
All 6 REGION properties individually documented
Header, timestamp, cue structure, special blocks rules
12-15 IMPL rules (generic, no pycaption-specific code)

Quality:

Unique rule IDs (no duplicates)
Sequential numbering within categories
Valid test patterns for all rules
Source attribution (W3C section references)
Generic IMPL rules (no pycaption-specific references)

Web Sources:

W3C WebVTT spec fetched
MDN documentation fetched
All new sources added to vtt_web_sources.md

Context Window Optimization

Token usage target: < 50K per invocation

Strategies:

Targeted web fetch - Extract text only, not full HTML
Incremental writing - Save spec file as rules are generated, not at end
On-demand web search - Only if completeness check finds gaps
Section-by-section - Process file format → timestamps → cues → tags → etc.
Rule metadata first - Extract rule IDs/levels, fetch details on-demand

Estimated token usage:

Web source fetches: 10-15K tokens
Rule generation (40-50 rules): 15-20K tokens
Validation & tables: 5K tokens
Total: ~35K tokens (30% safety margin)

Error Handling

vtt_web_sources.md not found: Create it with W3C spec URL
No URLs in file: Proceed with web search
Web fetch fails: Continue with available sources + web search
Web search fails: Use built-in W3C WebVTT knowledge
Cannot write output: Report error with path

analyze-vtt-docs

More from this repository

More from this repository

analyze-vtt-docs

What this skill does

Implementation

Step 0: Check Existing Sources

Step 1: Fetch Known Web Sources (WebFetch Tool Required)

Step 2: Comprehensive Web Search for Missing Details

Step 3: Exhaustive Completeness Verification

Step 4: Generate Exhaustive Specification

Step 5: Exhaustive Quality Validation

Step 6: Source Attribution

Step 7: Update Web Sources

Step 8: Post-Generation Validation Against Master Checklist

Output Files

Success Criteria (Exhaustive - 100% Required)

Context Window Optimization

Error Handling

analyze-vtt-docs

What this skill does

Implementation

Step 0: Check Existing Sources

Step 1: Fetch Known Web Sources (WebFetch Tool Required)

Step 2: Comprehensive Web Search for Missing Details

Step 3: Exhaustive Completeness Verification

Step 4: Generate Exhaustive Specification

Step 5: Exhaustive Quality Validation

Step 6: Source Attribution

Step 7: Update Web Sources

Step 8: Post-Generation Validation Against Master Checklist

Output Files

Success Criteria (Exhaustive - 100% Required)

Context Window Optimization

Error Handling