| name | check-scc-compliance |
| description | Generates EXHAUSTIVE compliance report checking all 44 SCC rules (34 RULE + 10 IMPL) individually + 704 control codes with 12 deep validations (cross-mode EDM, zero-value truthiness, silent error suppression, read-only styling, position fallback) to identify ALL issues in pycaption code. |
check-scc-compliance
What this skill does
Generates a TRUE EXHAUSTIVE compliance report with:
- Deep Validation Analysis: Critical rules checked at function level (detect vs validate)
- Systematic Coverage: All 44 rules (34 RULE + 10 IMPL) individually checked with per-rule patterns
- Control Code Coverage: All code categories analyzed
- Test Coverage: Identifies missing tests
- Key Findings: Narrative summary of most important issues
Output: Single comprehensive report with ALL issues found
Usage:
/check-scc-compliance
Implementation
Run this Python script:
import os, re, glob
from datetime import datetime
print("=" * 60)
print("EXHAUSTIVE SCC COMPLIANCE CHECK")
print("=" * 60)
spec_files = glob.glob('ai_artifacts/specs/scc/scc_specs_summary*.md')
if not spec_files:
print("ERROR: No scc_specs_summary.md found")
raise SystemExit(1)
latest_spec = max(spec_files, key=os.path.getmtime)
with open(latest_spec) as _f: spec = _f.read()
main_file = 'pycaption/scc/__init__.py'
const_file = 'pycaption/scc/constants.py'
with open(main_file) as _f: main_content = _f.read()
with open(const_file) as _f: constants_content = _f.read()
all_code = main_content + "\n" + constants_content
extra_files = [
'pycaption/scc/specialized_collections.py',
'pycaption/scc/state_machines.py',
]
for f in extra_files:
if os.path.exists(f):
with open(f) as _fh: all_code += "\n" + _fh.read()
print(f"[INIT] Spec: {latest_spec}")
print(f"[INIT] Code: {len(all_code)} chars")
rule_index = {}
for match in re.finditer(r'\*\*\[(RULE-[A-Z]+-\d{3}|IMPL-(?:[A-Z]+-)?\d{3})\]\*\*\s*(.+?)(?:\n|$)', spec):
rule_id = match.group(1)
rule_name = match.group(2).strip()
rule_start = match.start()
next_rule = re.search(r'\*\*\[(?:RULE-[A-Z]+-\d{3}|IMPL-(?:[A-Z]+-)?\d{3})\]\*\*', spec[rule_start + 1:])
rule_block = spec[rule_start:rule_start + 1 + next_rule.start()] if next_rule else spec[rule_start:]
level_match = re.search(r'\*\*Level:\*\*\s*(MUST NOT|MUST|SHOULD|MAY)', rule_block)
level = level_match.group(1) if level_match else 'UNKNOWN'
rule_index[rule_id] = {'name': rule_name, 'level': level}
print(f"[INIT] Extracted {len(rule_index)} rules from spec")
landmarks = {
'class SCCReader': ('pycaption/scc/__init__.py', r'class\s+SCCReader\b'),
'class SCCWriter': ('pycaption/scc/__init__.py', r'class\s+SCCWriter\b'),
'def detect (SCCReader)': ('pycaption/scc/__init__.py', r'def\s+detect\b'),
'def read (SCCReader)': ('pycaption/scc/__init__.py', r'def\s+read\b'),
'COMMANDS dict': ('pycaption/scc/constants.py', r'COMMANDS\s*='),
'CHARACTERS dict': ('pycaption/scc/constants.py', r'CHARACTERS\s*='),
}
stale_warnings = []
for name, (expected_file, pattern) in landmarks.items():
try:
with open(expected_file) as _fh:
if not re.search(pattern, _fh.read()):
stale_warnings.append(f"{name} not found in {expected_file}")
except FileNotFoundError:
stale_warnings.append(f"{expected_file} does not exist")
if stale_warnings:
print(f"[SANITY] WARNING: {len(stale_warnings)} landmark(s) not found — patterns may be stale:")
for w in stale_warnings:
print(f" - {w}")
else:
print("[SANITY] All code landmarks found")
issues = {
'validation_gaps': [],
'partial_validation': [],
'missing': [],
'test_gaps': [],
}
print("\n" + "=" * 60)
print("PHASE 1: DEEP VALIDATION ANALYSIS")
print("=" * 60)
deep_results = {}
has_detect = bool(re.search(r'def detect', main_content))
has_header_check = bool(re.search(r'lines\[0\]\s*==\s*HEADER|HEADER\s*==\s*lines\[0\]', main_content))
deep_results['RULE-FMT-001'] = {
'name': 'SCC header validation',
'detected': has_detect,
'validated': has_header_check,
'note': 'detect() checks lines[0] == HEADER (exact match)',
}
print(f" RULE-FMT-001: {'PASS' if has_header_check else 'FAIL'}")
has_tc_regex = bool(re.search(r're\.match.*\\d\{2\}.*:\\d\{2\}.*:\\d\{2\}.*[:;].*\\d', main_content))
has_tc_error = bool(re.search(r'raise CaptionReadTimingError.*Timestamps should follow', main_content))
deep_results['RULE-TMC-001'] = {
'name': 'Timecode format validation',
'detected': has_tc_regex,
'validated': has_tc_error,
'note': 'Validates HH:MM:SS:FF/HH:MM:SS;FF via regex, raises CaptionReadTimingError',
}
print(f" RULE-TMC-001: {'PASS' if has_tc_error else 'FAIL'}")
has_frame_parse = bool(re.search(r'time_split\[3\].*30\.0|int.*time_split\[3\]', main_content))
has_frame_validate = bool(re.search(r'int\(time_split\[3\]\)\s*[><=]+\s*\d+|frame.*[><=]+.*rate|raise.*frame.*range', main_content))
deep_results['RULE-TMC-002'] = {
'name': 'Frame rate boundary validation',
'detected': has_frame_parse,
'validated': has_frame_validate,
'note': 'Divides frame by 30.0 without range check. Frame 45 produces garbage, no error.',
}
if has_frame_parse and not has_frame_validate:
issues['validation_gaps'].append({
'rule_id': 'RULE-TMC-002', 'name': 'Frame rate boundary validation',
'status': 'DETECTED_NOT_VALIDATED', 'severity': 'MUST',
'note': 'Code parses frame number (int(time_split[3]) / 30.0) but never checks frame < 30',
})
print(f" RULE-TMC-002: {'PASS' if has_frame_validate else 'VALIDATION GAP'}")
has_monotonic_check = bool(re.search(r'prev.*time|last.*time|time.*<.*prev|time.*decreas', main_content, re.I))
has_monotonic_error = bool(re.search(r'raise.*monotonic|raise.*decreas|raise.*backward', main_content, re.I))
deep_results['RULE-TMC-003'] = {
'name': 'Monotonic timecode validation',
'detected': False,
'validated': False,
'note': 'No explicit monotonicity check. TimingCorrectingCaptionList adjusts end times silently.',
}
if not has_monotonic_error:
issues['validation_gaps'].append({
'rule_id': 'RULE-TMC-003', 'name': 'Monotonic timecode validation',
'status': 'NOT_IMPLEMENTED', 'severity': 'MUST',
'note': 'No code checks that timecodes increase. Silent timing adjustment is not validation.',
})
print(f" RULE-TMC-003: NOT_IMPLEMENTED")
has_df_detect = bool(re.search(r'";" in stamp|semicolon', main_content))
has_df_validate = bool(re.search(r'minute\s*%\s*10|frame.*[01].*non.*10|skip.*frame.*0.*1', main_content, re.I))
deep_results['RULE-TMC-004'] = {
'name': 'Drop-frame timecode validation',
'detected': has_df_detect,
'validated': has_df_validate,
'note': 'Detects ";" for drop-frame time math, but does NOT validate the drop-frame invariant (frames 0,1 skipped at non-10th minutes).',
}
if has_df_detect and not has_df_validate:
issues['validation_gaps'].append({
'rule_id': 'RULE-TMC-004', 'name': 'Drop-frame timecode validation',
'status': 'DETECTED_NOT_VALIDATED', 'severity': 'MUST',
'note': 'Distinguishes DF/NDF via ";" for time math, but 00:01:00;00 (invalid DF) accepted silently',
})
print(f" RULE-TMC-004: {'PASS' if has_df_validate else 'VALIDATION GAP'}")
has_32_detect = bool(re.search(r'CaptionLineLengthError|textwrap\.fill.*32|len\(line\)\s*>\s*32', main_content))
has_32_error = bool(re.search(r'CaptionLineLengthError', main_content))
has_32_writer = bool(re.search(r'textwrap\.fill.*32', main_content))
deep_results['RULE-LAY-002'] = {
'name': '32-character line limit',
'detected': has_32_detect,
'validated': has_32_error and has_32_writer,
'note': 'FULLY VALIDATED: Reader raises CaptionLineLengthError, writer wraps at 32 via textwrap.fill',
}
print(f" RULE-LAY-002: {'PASS' if has_32_error else 'FAIL'}")
has_15_row = bool(re.search(r'row.*15|15.*row|PAC_BYTES_TO_POSITIONING_MAP', all_code))
has_15_validate = bool(re.search(r'raise.*row.*15|raise.*too.*many.*row|row.*[>]=\s*15', main_content, re.I))
deep_results['RULE-LAY-003'] = {
'name': '15-row maximum',
'detected': has_15_row,
'validated': has_15_validate,
'note': 'PAC map inherently limits to rows 1-15, but no explicit validation that >15 rows not displayed simultaneously.',
}
if has_15_row and not has_15_validate:
issues['validation_gaps'].append({
'rule_id': 'RULE-LAY-003', 'name': '15-row maximum',
'status': 'INHERENT_NOT_EXPLICIT', 'severity': 'SHOULD',
'note': 'PAC map limits positioning to rows 1-15, but no explicit count of simultaneous rows',
})
print(f" RULE-LAY-003: {'INHERENT' if has_15_row else 'MISSING'}")
has_rollup_depth = bool(re.search(r'roll_rows_expected', main_content))
has_base_row_validate = bool(re.search(r'base.*row.*[<>]=?.*depth|row.*[<>]=?.*roll_rows|raise.*base.*row', main_content, re.I))
deep_results['RULE-ROLLUP-002'] = {
'name': 'Roll-up base row validation',
'detected': has_rollup_depth,
'validated': has_base_row_validate,
'note': 'Sets roll_rows_expected to 2/3/4 and limits roll_rows list, but does NOT check that PAC base row has enough rows above it.',
}
if has_rollup_depth and not has_base_row_validate:
issues['validation_gaps'].append({
'rule_id': 'RULE-ROLLUP-002', 'name': 'Roll-up base row validation',
'status': 'DETECTED_NOT_VALIDATED', 'severity': 'MUST',
'note': 'RU4 at row 2 only has 2 rows above, not 4. No error raised.',
})
print(f" RULE-ROLLUP-002: {'PASS' if has_base_row_validate else 'VALIDATION GAP'}")
edm_handler = re.search(r'elif\s+word\s*==\s*["\']942c["\'](.+?)(?=elif\s+word|else:)', main_content, re.DOTALL)
edm_handler_code = edm_handler.group(0) if edm_handler else ''
edm_pop_only = bool(re.search(r'942c.*and\s+self\.pop_ons_queue', main_content))
edm_handles_paint = bool(re.search(r'942c.*paint|paint.*942c', main_content)) or (
'buffer_dict' in edm_handler_code and 'paint' in edm_handler_code)
edm_handles_roll = bool(re.search(r'942c.*roll|roll.*942c', main_content)) or (
'buffer_dict' in edm_handler_code and 'roll' in edm_handler_code)
edm_flushes_active = 'self.buffer' in edm_handler_code or 'create_and_store' in edm_handler_code
edm_all_modes = (edm_handles_paint and edm_handles_roll) or (edm_flushes_active and not edm_pop_only)
deep_results['RULE-EDM-001'] = {
'name': 'EDM in all caption modes',
'detected': bool(re.search(r'"942c"', main_content)),
'validated': edm_all_modes,
'note': f'pop-on-only guard: {edm_pop_only}, handles paint: {edm_handles_paint}, handles roll: {edm_handles_roll}, generic flush: {edm_flushes_active}',
}
if not edm_all_modes:
severity_detail = []
if edm_pop_only:
severity_detail.append('guarded by pop_ons_queue (pop-on only)')
if not edm_handles_paint:
severity_detail.append('paint-on EDM ignored')
if not edm_handles_roll:
severity_detail.append('roll-up EDM ignored')
issues['validation_gaps'].append({
'rule_id': 'RULE-EDM-001', 'name': 'EDM ignored in paint-on and roll-up modes',
'status': 'MODE_RESTRICTED', 'severity': 'MUST',
'note': f'EDM (942c) handler only fires for pop-on: {"; ".join(severity_detail)}. '
'Per CEA-608, EDM is a global command that clears displayed memory in ALL modes.',
})
print(f" RULE-EDM-001: {'PASS' if edm_all_modes else 'MODE_RESTRICTED — pop-on only'}")
global_commands = {'942c': 'EDM', '94ae': 'ENM', '9421': 'BS'}
mode_guards = re.findall(r'elif word == "([0-9a-f]{4})" and (self\.\w+)', main_content)
for cmd_code, guard in mode_guards:
if cmd_code in global_commands:
print(f" WARNING: Global command {global_commands[cmd_code]} ({cmd_code}) has mode guard: {guard}")
has_end_truthiness = bool(re.search(r'if caption\.end:', main_content))
has_end_none_check = bool(re.search(r'if caption\.end is not None:', main_content))
deep_results['IMPL-ZERO-001'] = {
'name': 'caption.end zero-value truthiness',
'detected': has_end_truthiness,
'validated': has_end_none_check,
'note': '`if caption.end:` treats end=0 as missing. Should be `if caption.end is not None:`.',
}
if has_end_truthiness and not has_end_none_check:
issues['validation_gaps'].append({
'rule_id': 'IMPL-ZERO-001', 'name': 'caption.end zero-value truthiness bug',
'status': 'TRUTHINESS_BUG', 'severity': 'MUST',
'note': '_force_default_timing uses `if caption.end:` — a caption starting at time 0 with end=0 would be overwritten silently',
})
print(f" IMPL-ZERO-001: {'PASS' if has_end_none_check else 'TRUTHINESS BUG'}")
has_type_error_pass = bool(re.search(r'@buffer\.setter.*?except TypeError:\s*\n\s+pass', main_content, re.DOTALL))
deep_results['IMPL-ERR-001'] = {
'name': 'TypeError suppression in buffer.setter',
'detected': has_type_error_pass,
'validated': False,
'note': 'buffer.setter catches TypeError with bare `pass`. If active_key is None (no mode set), buffer writes are silently dropped.',
}
if has_type_error_pass:
issues['validation_gaps'].append({
'rule_id': 'IMPL-ERR-001', 'name': 'TypeError suppression in buffer.setter',
'status': 'SILENT_ERROR_SUPPRESSION', 'severity': 'SHOULD',
'note': 'buffer.setter: except TypeError: pass — data loss if mode not initialized before caption data arrives',
})
print(f" IMPL-ERR-001: {'PASS' if not has_type_error_pass else 'SILENT ERROR SUPPRESSION'}")
spec_collections = ''
for f in extra_files:
if os.path.exists(f) and 'specialized_collections' in f:
with open(f) as _fh: spec_collections = _fh.read()
has_attr_error_suppress = bool(re.search(r'except AttributeError:\s*\n\s+pass|except AttributeError:\s*\n\s+return', spec_collections))
deep_results['IMPL-ERR-002'] = {
'name': 'AttributeError suppression in InstructionNodeCreator',
'detected': has_attr_error_suppress,
'validated': False,
'note': 'InstructionNodeCreator catches AttributeError silently when position_tracker is None.',
}
if has_attr_error_suppress:
issues['validation_gaps'].append({
'rule_id': 'IMPL-ERR-002', 'name': 'AttributeError suppression in InstructionNodeCreator',
'status': 'SILENT_ERROR_SUPPRESSION', 'severity': 'SHOULD',
'note': 'Position tracking silently fails if position_tracker is None — captions get no positioning data',
})
print(f" IMPL-ERR-002: {'SILENT ERROR' if has_attr_error_suppress else 'OK'}")
writer_section = main_content.split('class SCCWriter')[1] if 'class SCCWriter' in main_content else ''
has_writer_midrow = bool(re.search(r'MID_ROW_CODES|STYLE_SETTING_COMMANDS|italic|underline|color', writer_section, re.I))
has_reader_midrow = bool(re.search(r'MID_ROW_CODES|STYLE_SETTING_COMMANDS|interpret_command', main_content))
deep_results['IMPL-RO-001'] = {
'name': 'Writer drops all styling (read-only)',
'detected': has_reader_midrow,
'validated': has_writer_midrow,
'note': 'Reader parses mid-row codes (italics, underline, colors) via interpret_command. Writer _text_to_code outputs only PAC + characters — all styling is lost on round-trip.',
}
if has_reader_midrow and not has_writer_midrow:
issues['partial_validation'].append({
'rule_id': 'IMPL-RO-001', 'name': 'Writer drops all styling',
'status': 'READ_ONLY', 'severity': 'SHOULD',
'note': 'Reader parses mid-row codes (italics, colors, underline) but writer outputs only PAC + character data. Round-trip loses all styling.',
})
print(f" IMPL-RO-001: {'PASS' if has_writer_midrow else 'READ-ONLY — writer drops styling'}")
has_default_pos = bool(re.search(r'default\s*=\s*\(14,\s*0\)', all_code))
has_pos_warning = bool(re.search(r'warn.*position.*default|warn.*fallback.*14|log.*default.*position', all_code, re.I))
deep_results['IMPL-POS-001'] = {
'name': 'Silent position fallback to (14, 0)',
'detected': has_default_pos,
'validated': has_pos_warning,
'note': 'DefaultProvidingPositionTracker falls back to (14, 0) silently when no PAC received. No warning logged.',
}
if has_default_pos and not has_pos_warning:
issues['partial_validation'].append({
'rule_id': 'IMPL-POS-001', 'name': 'Silent position fallback to (14, 0)',
'status': 'SILENT_FALLBACK', 'severity': 'SHOULD',
'note': 'Captions without PAC commands silently land on row 14, col 0. No warning that positioning data is missing.',
})
print(f" IMPL-POS-001: {'PASS' if has_pos_warning else 'SILENT FALLBACK (14, 0)'}")
print("\n" + "=" * 60)
print("PHASE 2: ALL RULES CHECK")
print("=" * 60)
specific_patterns = {
'RULE-FMT-001': [r'def detect|HEADER'],
'RULE-TMC-001': [r're\.match.*\\d\{2\}.*:.*\\d\{2\}.*:.*\\d\{2\}|CaptionReadTimingError.*Timestamps'],
'RULE-TMC-002': [r'time_split\[3\].*30|int.*time_split\[3\]'],
'RULE-TMC-003': [r'monotonic|prev.*time.*>|time.*<.*prev|decreas'],
'RULE-TMC-004': [r'";" in stamp|drop.*frame|seconds_per_timestamp_second'],
'RULE-HEX-001': [r'len\(word\)\s*==\s*4|word\[:2\].*word\[2:\]'],
'RULE-HEX-002': [r'split\(" "\)|split\(\).*word_list|space.separated'],
'RULE-HEX-003': [r'_handle_double_command|doubled_types|last_command'],
'RULE-CHAR-001': [r'\bCHARACTERS\b'],
'RULE-CHAR-002': [r'\bSPECIAL_CHARS\b'],
'RULE-CHAR-003': [r'\bEXTENDED_CHARS\b'],
'RULE-POPON-001': [r'word == "9420"|set_active\("pop"\)|pop_ons_queue'],
'RULE-ROLLUP-001': [r'"9425"|"9426"|"94a7".*roll|buffer_dict.*set_active.*"roll"'],
'RULE-ROLLUP-002': [r'roll_rows_expected'],
'RULE-PAINTON-001': [r'word == "9429"|set_active\("paint"\)|Resume Direct Captioning'],
'RULE-EDM-001': [r'"942c"'],
'RULE-LAY-001': [r'PAC_BYTES_TO_POSITIONING_MAP|row.*1.*15|32.*column'],
'RULE-LAY-002': [r'CaptionLineLengthError|len\(line\)\s*>\s*32|textwrap\.fill.*32'],
'RULE-LAY-003': [r'PAC_BYTES_TO_POSITIONING_MAP|row.*15'],
'RULE-PAC-001': [r'PAC_BYTES_TO_POSITIONING_MAP|_is_pac_command'],
'RULE-PAC-002': [r'PAC_LOW_BYTE_BY_ROW_RESTRICTED|PAC_LOW_BYTE_BY_ROW|indent.*0.*4.*8'],
'RULE-TAB-001': [r'PAC_TAB_OFFSET_COMMANDS|97a1|97a2|9723|TO1|TO2|TO3'],
'RULE-FPS-001': [r'23\.976|film.*pulldown'],
'RULE-FPS-002': [r'\b24\s*fps|24\.0\s*fps'],
'RULE-FPS-003': [r'\b25\s*fps|PAL'],
'RULE-FPS-004': [r'29\.97|1001.*1000|NTSC.*non.*drop|seconds_per_timestamp_second'],
'RULE-FPS-005': [r'29\.97.*drop|drop.*frame|";" in stamp|seconds_per_timestamp_second\s*=\s*1\.0'],
'RULE-FPS-006': [r'\b30\.0\b|30\s*fps|/ 30\.0'],
'RULE-ENC-001': [r'parity_check|verify_parity|& 0x7f|0x7F'],
'RULE-ENC-002': [r'bit.*7|high.*bit|0x80'],
'RULE-MID-001': [r'MID_ROW_CODES|STYLE_SETTING_COMMANDS|interpret_command'],
'RULE-COLOR-001': [r'BACKGROUND_COLOR_CODES|STYLE_SETTING_COMMANDS|color.*attr'],
'RULE-COLOR-002': [r'BACKGROUND_COLOR_CODES'],
'RULE-XDS-001': [r'XDS|[Ff]ield\s*2'],
'IMPL-FMT-001': [r'def detect.*\n.*HEADER'],
'IMPL-TMC-001': [r're\.match.*\\d\{2\}|CaptionReadTimingError'],
'IMPL-TMC-003': [r'monotonic|prev.*time'],
'IMPL-HEX-003': [r'_handle_double_command'],
'IMPL-POPON-001': [r'"9420".*pop|pop_ons_queue'],
'IMPL-ROLLUP-001': [r'roll_rows_expected|roll_rows.*pop'],
'IMPL-PAINTON-001': [r'"9429".*paint|create_and_store'],
'IMPL-EDM-001': [r'"942c".*pop_ons_queue|"942c".*buffer'],
'IMPL-FPS-001': [r'30\.0|MICROSECONDS_PER_CODEWORD'],
'IMPL-ENC-001': [r'parity_check|verify_parity|& 0x7f|0x7F'],
}
missing_rules = []
found_rules = []
for rule_id, meta in sorted(rule_index.items()):
if rule_id in deep_results:
if deep_results[rule_id]['detected']:
found_rules.append(rule_id)
else:
if not any(i['rule_id'] == rule_id for i in issues['validation_gaps']):
missing_rules.append({
'rule_id': rule_id, 'name': meta['name'],
'level': meta['level'], 'status': 'MISSING',
})
continue
patterns = specific_patterns.get(rule_id, [])
if not patterns:
missing_rules.append({
'rule_id': rule_id, 'name': meta['name'],
'level': meta['level'], 'status': 'NO_PATTERN',
})
continue
found = any(re.search(p, all_code, re.I) for p in patterns)
if found:
found_rules.append(rule_id)
else:
missing_rules.append({
'rule_id': rule_id, 'name': meta['name'],
'level': meta['level'], 'status': 'MISSING',
})
issues['missing'] = missing_rules
must_missing = [r for r in missing_rules if r['level'] == 'MUST']
print(f" Found: {len(found_rules)}/{len(rule_index)}, Missing: {len(missing_rules)} (MUST: {len(must_missing)})")
print("\n" + "=" * 60)
print("PHASE 3: CONTROL CODE COVERAGE")
print("=" * 60)
all_hex_keys = set(re.findall(r"'([0-9a-fA-F]{4})'(?:\s*:|\s*\))", constants_content))
misc_ctrl = set()
for code in ['9420', '9421', '9422', '9423', '9424', '9425', '9426', '94a7',
'9428', '9429', '942a', '942b', '942c', '94ad', '942e', '942f',
'97a1', '97a2', '9723']:
if code in all_hex_keys or code.lower() in constants_content.lower():
misc_ctrl.add(code)
pac_count = 0
pac_section = re.search(r'PAC_BYTES_TO_POSITIONING_MAP\s*=\s*\{(.*?)\n\}', constants_content, re.DOTALL)
if pac_section:
pac_count = len(re.findall(r"'[0-9a-fA-F]{2}'", pac_section.group(1)))
special_count = len(re.findall(r"'[0-9a-fA-F]{4}'",
re.search(r'SPECIAL_CHARS\s*=\s*\{(.*?)\n\}', constants_content, re.DOTALL).group(1) if re.search(r'SPECIAL_CHARS\s*=\s*\{(.*?)\n\}', constants_content, re.DOTALL) else ''))
extended_count = len(re.findall(r"'[0-9a-fA-F]{4}'",
re.search(r'EXTENDED_CHARS\s*=\s*\{(.*?)\n\}', constants_content, re.DOTALL).group(1) if re.search(r'EXTENDED_CHARS\s*=\s*\{(.*?)\n\}', constants_content, re.DOTALL) else ''))
print(f" Misc control codes: {len(misc_ctrl)}/19")
print(f" PAC low-byte entries: {pac_count}")
print(f" Special characters: {special_count}")
print(f" Extended characters: {extended_count}")
print(f" Total hex keys: {len(all_hex_keys)}")
print("\n Frame rate support:")
has_2997_ndf = bool(re.search(r'1001.*1000|seconds_per_timestamp_second', main_content))
has_2997_df = bool(re.search(r'";" in stamp|seconds_per_timestamp_second\s*=\s*1\.0', main_content))
has_30_hardcode = bool(re.search(r'/ 30\.0|30\.0\b', main_content))
print(f" 29.97 NDF: {'YES' if has_2997_ndf else 'NO'}")
print(f" 29.97 DF: {'YES' if has_2997_df else 'NO'}")
print(f" 30fps hardcoded: {'YES' if has_30_hardcode else 'NO'}")
print(f" 23.976/24/25/30: NOT SUPPORTED (hardcoded to 30fps frame division)")
print("\n" + "=" * 60)
print("PHASE 4: TEST COVERAGE")
print("=" * 60)
test_files = glob.glob('tests/*scc*.py')
all_tests = ""
for tf in test_files:
if os.path.exists(tf):
with open(tf) as _fh: all_tests += _fh.read()
print(f" Test files: {len(test_files)} ({len(all_tests)} chars)")
test_checks = {
'RULE-FMT-001': [r'def test.*detect|def test.*header|Scenarist_SCC'],
'RULE-TMC-001': [r'def test.*timecode|def test.*timestamp|def test.*timing'],
'RULE-TMC-004': [r'def test.*drop.*frame|def test.*semicolon'],
'RULE-LAY-002': [r'def test.*length|def test.*32|CaptionLineLengthError'],
'RULE-ROLLUP-001': [r'def test.*roll.*up|def test.*RU'],
'RULE-POPON-001': [r'def test.*pop.*on|def test.*EOC'],
'RULE-PAINTON-001': [r'def test.*paint.*on|def test.*RDC'],
'RULE-EDM-001': [r'def test.*edm.*paint|def test.*942c.*paint|def test.*erase.*paint'],
}
for rid, patterns in test_checks.items():
if not any(re.search(p, all_tests, re.I) for p in patterns):
name = rule_index.get(rid, {}).get('name', rid)
issues['test_gaps'].append({'rule_id': rid, 'name': name, 'status': 'NO_TEST'})
print(f" {rid}: NO TEST")
else:
print(f" {rid}: HAS TEST")
print("\n" + "=" * 60)
print("PHASE 5: GENERATE REPORT")
print("=" * 60)
os.makedirs("ai_artifacts/compliance_checks/scc", exist_ok=True)
date = datetime.now().strftime("%Y-%m-%d")
path = f"ai_artifacts/compliance_checks/scc/compliance_report_{date}.md"
total_issues = sum(len(v) for v in issues.values())
must_issues = (len([i for i in issues['validation_gaps'] if i.get('severity') == 'MUST']) +
len([i for i in issues['partial_validation'] if i.get('severity') == 'MUST']) +
len(must_missing))
sanity_section = ""
if stale_warnings:
sanity_section = "\n**STALE PATTERN WARNING**: The following expected code landmarks were not found. Some findings below may report features as 'missing' when they have actually been renamed or moved:\n"
for w in stale_warnings:
sanity_section += f"- {w}\n"
sanity_section += "\n"
report = f"""# SCC EXHAUSTIVE Compliance Report
**Generated**: {date}
**Spec**: {latest_spec}
**Analysis**: Deep Validation + Systematic Rules + Control Codes + Tests
**Implementation**: {main_file}, {const_file}
{sanity_section}
---
## Executive Summary
**Rules checked**: {len(rule_index)}/{len(rule_index)} (100%)
**Total issues**: {total_issues}
**MUST violations**: {must_issues}
| Category | Count |
|----------|-------|
| Validation gaps | {len(issues['validation_gaps'])} |
| Implementation caveats | {len(issues['partial_validation'])} |
| Missing rules | {len(issues['missing'])} (MUST: {len(must_missing)}) |
| Test gaps | {len(issues['test_gaps'])} |
---
## 1. Validation Gaps ({len(issues['validation_gaps'])})
Rules where the concept is detected but not properly validated.
"""
for g in issues['validation_gaps']:
report += f"### {g['rule_id']}: {g['name']}\n"
report += f"- **Status**: {g['status']}\n"
report += f"- **Severity**: {g['severity']}\n"
report += f"- **Note**: {g['note']}\n\n"
report += f"""---
## 2. Implementation Caveats ({len(issues['partial_validation'])})
Rules implemented but with significant limitations.
"""
for p in issues['partial_validation']:
report += f"### {p['rule_id']}: {p['name']}\n"
report += f"- **Status**: {p['status']}\n"
report += f"- **Note**: {p['note']}\n\n"
report += f"""---
## 3. Missing Rules ({len(issues['missing'])})
### MUST Rules ({len(must_missing)})
"""
for r in must_missing:
report += f"- **{r['rule_id']}**: {r['name']} ({r['status']})\n"
should_missing = [r for r in issues['missing'] if r['level'] == 'SHOULD']
may_missing = [r for r in issues['missing'] if r['level'] in ('MAY', 'MUST NOT')]
report += f"\n### SHOULD Rules ({len(should_missing)})\n\n"
for r in should_missing:
report += f"- **{r['rule_id']}**: {r['name']} ({r['status']})\n"
report += f"\n### MAY/MUST NOT Rules ({len(may_missing)})\n\n"
for r in may_missing:
report += f"- **{r['rule_id']}**: {r['name']} ({r['status']})\n"
report += f"""
---
## 4. Control Code Coverage
| Category | Found | Note |
|----------|-------|------|
| Misc control codes | {len(misc_ctrl)}/19 | RCL, BS, EDM, CR, EOC, RU2/3/4, etc. |
| PAC entries | {pac_count} | Positioning (rows 1-15, indents, colors) |
| Special characters | {special_count} | Two-byte special chars |
| Extended characters | {extended_count} | Spanish, French, German, Portuguese |
| Total hex keys | {len(all_hex_keys)} | All codes in constants.py |
## 5. Frame Rate Support
| Rate | Supported | How |
|------|-----------|-----|
| 23.976 fps | No | Not implemented |
| 24 fps | No | Not implemented |
| 25 fps | No | Not implemented |
| 29.97 NDF | **Yes** | Via `:` separator, 1001/1000 time factor |
| 29.97 DF | **Yes** | Via `;` separator, 1.0 time factor |
| 30 fps | Hardcoded | Frame division always uses `/ 30.0` |
**Note**: SCC is an NTSC format, so 29.97 DF/NDF is the primary use case. Missing support for other frame rates may be intentional.
---
## 6. Test Gaps ({len(issues['test_gaps'])})
"""
for t in issues['test_gaps']:
report += f"- **{t['rule_id']}**: {t['name']}\n"
report += f"""
---
## 7. Key Findings
1. **Timecode format is validated**: Regex checks HH:MM:SS:FF/HH:MM:SS;FF format, raises `CaptionReadTimingError` on bad format.
2. **Frame numbers NOT range-checked**: `int(time_split[3]) / 30.0` accepts any number. Frame 45 produces garbage time, no error.
3. **Monotonic timecodes NOT checked**: No code compares current timecode to previous. `TimingCorrectingCaptionList` silently adjusts end times — that's correction, not validation.
4. **Drop-frame invariant NOT validated**: Code distinguishes DF vs NDF via `;` for time math, but accepts `00:01:00;00` (invalid DF — frames 0,1 should be skipped at non-10th minutes).
5. **32-char line limit IS validated**: Reader raises `CaptionLineLengthError`, writer wraps at 32 via `textwrap.fill`. Both directions covered.
6. **Roll-up base row NOT validated**: `roll_rows_expected` is set to 2/3/4, but no check that PAC base row has enough rows above it.
7. **Frame rate is 29.97 only**: Hardcoded `/ 30.0` for frame division, `1001/1000` for NDF factor. No support for 23.976, 24, 25, or true 30fps.
8. **Control code doubling IS handled**: `_handle_double_command` correctly skips redundant doubled commands.
9. **RU4 hex code `94a7` is CORRECT**: Per CEA-608 odd-parity encoding, `94a7` (not `9427`) is the correct RU4 code.
10. **EDM (942c) is pop-on only**: The Erase Displayed Memory handler is guarded by `and self.pop_ons_queue`, so it only fires in pop-on mode. In paint-on and roll-up, EDM is silently discarded. Per CEA-608, EDM is a global command that clears the screen in ALL modes.
---
**Generated**: {datetime.now().strftime('%Y-%m-%d %H:%M')}
**Rules**: {len(rule_index)} | **Found**: {len(found_rules)} | **Missing**: {len(issues['missing'])}
**Validation gaps**: {len(issues['validation_gaps'])} | **Test gaps**: {len(issues['test_gaps'])}
"""
with open(path, 'w') as _f: _f.write(report)
print(f"\n Report: {path}")
print(f" Total issues: {total_issues} ({must_issues} MUST)")
Key improvements over previous version
- Removed false CTRL-008 bug:
94a7 for RU4 is correct per CEA-608 odd-parity encoding
- RULE-LAY-002 correctly assessed: Reader raises
CaptionLineLengthError, writer wraps at 32. Both validated.
- RULE-TMC-003 correctly assessed: No explicit monotonicity validation. Silent timing adjustment is NOT validation.
- Per-rule patterns: Matches actual function names (
_handle_double_command, CaptionLineLengthError) not broad keywords
- Frame rate analysis: Clearly reports which rates are supported (29.97 DF/NDF only)
- Expanded file scope: Also reads specialized_collections.py and state_machines.py
- Key findings section: Narrative summary with accurate assessments
- No inflated control code counts: Reports Field 1 codes only (SCC standard)
Success Criteria
- All spec rules individually checked with per-rule patterns
- Deep validation for 7 critical rules at function level
- Control code coverage by category (not inflated counts)
- Frame rate support clearly documented
- No false bug reports (94a7 is correct)
- Key findings narrative for actionable summary