name	bulk-rule-reviewer
description	Execute agent-centric reviews on all rules in rules/ directory and generate prioritized improvement report
version	2.2.0

Bulk Rule Reviewer

Overview

Execute comprehensive agent-centric reviews on all rule files in rules/ directory, then generate consolidated priority report. Designed for periodic quality audits, pre-release validation, and technical debt tracking.

When to Use

Periodic quality audits (quarterly/monthly)
Pre-release validation before major version releases
Technical debt tracking and prioritization
Baseline quality measurement for improvement initiatives

Inputs

Required:

review_date: YYYY-MM-DD (default: today)
review_mode: FULL | FOCUSED | STALENESS (default: FULL)
model: Lowercase-hyphenated slug (default: claude-sonnet-45)

Optional:

filter_pattern: Glob pattern (default: rules/*.md)
- Examples: rules/100-*.md (Snowflake only), rules/*-core.md (cores only)
skip_existing: Boolean (default: true) - Resume capability
overwrite: Boolean (default: false) - If true, overwrite existing review files. If false, use sequential numbering (-01, -02, etc.) for conflicts
max_parallel: Integer 1-10 (default: 5) - Concurrent sub-agent workers. Set to 1 for sequential execution (legacy behavior)
output_root: Root directory for output files (default: reviews/). Subdirectories rule-reviews/ and summaries/ appended automatically. Supports relative paths including ../.
timing_enabled: true | false (default: false) - Enable execution timing

Outputs

Individual reviews: {output_root}/rule-reviews/<rule-name>-<model>-<date>.md (up to 187 files)

Master summary: {output_root}/summaries/_bulk-review-<model>-<date>.md with sections:

(Default output_root: reviews/. With output_root: mytest/ → mytest/rule-reviews/... and mytest/summaries/...)

Executive Summary (score distribution, dimension analysis)
Priority 1: Urgent (score <50, NOT_EXECUTABLE)
Priority 2: High (score 50-74, NEEDS_REFINEMENT)
Priority 3: Medium (score 75-89, EXECUTABLE_WITH_REFINEMENTS)
Priority 4: Excellent (score 90-100, EXECUTABLE)
Failed Reviews (execution errors)
Top 10 Recommendations (impact × effort prioritization)
Next Steps (immediate/short-term/long-term)
Appendix: All Rules by Score (sorted table)

Critical Execution Protocol

CRITICAL: Anti-Optimization Protocol

FOUNDATIONAL PRINCIPLE: This skill prioritizes ACCURACY over efficiency.

Forbidden Optimization Thoughts:

"This will take too long"
"I can save time by..."
"Token costs are high"
"The user won't notice if I..."
"I should ask about time constraints"
"Let me create a faster approach"
"Let me batch these rules together"
"I can validate multiple rules at once"
"Let me process rules 4-8 efficiently"
"I'll read several rules before reviewing"

Required Mindset:

"I will complete the full process for each rule"
"Quality signals are worth the cost"
"The user authorized ACT knowing the scope"
"Previous runs completed successfully at this pace"

If you find yourself thinking about optimizing the review process itself:

STOP
Re-read this section
Return to the comprehensive process
Do NOT ask the user about time/complexity

IMPORTANT DISTINCTION:

Optimizing the SKILL EXECUTION (forbidden)
- "I can review faster by skipping rubric consultation"
- "I'll batch these reviews to save time"
- "Let me create a template"
Evaluating RULE OPTIMIZATION (required)
- Scoring Token Efficiency dimension
- Identifying verbose patterns in rules
- Recommending rule consolidation

The user has already considered timing and scope. Proceed with the work.

MANDATORY: One Rule At A Time

NEVER batch multiple rules. Each rule MUST be processed individually:

Read ONE rule file completely
Run schema validator for THAT rule
Apply ALL 6 scored rubrics to THAT rule
Write review file for THAT rule
THEN move to next rule

Forbidden patterns:

for f in rules/*.md; do (batch shell loops for validation)
Reading multiple rules before writing any reviews
"Let me process rules 4-8 together"
Creating templates to speed up reviews
Parallelizing any review steps
Combining schema validation for multiple files
"Efficient batch processing"

Why batching fails:

Context overflow loses rubric details mid-batch
Template-based reviews miss rule-specific issues
Shortcut thinking violates rubric requirements
Evidence requirements (15+ line refs) impossible without focused reading

Canary Check - Batching Detection:

Before EACH rule review, verify:

I am reviewing exactly ONE rule
I have not combined multiple rules in this tool call
My previous tool call was for ONE rule only
I am NOT creating a "faster approach"

If ANY check fails: STOP, re-read Anti-Optimization Protocol, resume with ONE rule.

Required Progress Output

After EACH rule review, output exactly:

[N/<total_rules>] Complete: {filename} → {score}/100

This output MUST appear after writing EACH review file. If you see yourself planning to output multiple completion messages at once, you are batching - STOP immediately.

MANDATORY ENFORCEMENT

This skill MUST execute the complete rule-reviewer workflow for each rule file.

Context Anchor Protocol (CRITICAL)

These sections MUST remain in active context throughout execution. NEVER summarize or drop them:

Anti-Optimization Protocol (this section, lines 52-86)
Skills vs Rules distinction (lines 188-218)
Evidence Requirements (lines 309-345)

When context pressure occurs:

Summarize completed rule file contents FIRST
Summarize completed review contents SECOND
NEVER summarize anchor sections

Structural Enforcement (MANDATORY): Re-read TEMPLATE.md every 5 rules:

if rule_number % 5 == 0:
    read_file("skills/rule-reviewer/examples/TEMPLATE.md")

Drift Detection: If any review file is <2500 bytes OR format deviates, immediately re-read:

skills/rule-reviewer/examples/TEMPLATE.md (output format specification)
SKILL.md (full protocol)

Format Deviation Check: After EVERY review, verify the output matches TEMPLATE.md structure:

Executive Summary table with exact headers: | Dimension | Raw (0-10) | Weight | Points | Notes |
All 7 required sections present in order
If deviation detected: re-read TEMPLATE.md and regenerate the review

See: workflows/context-anchor.md and workflows/inter-rule-gate.md for full protocol

How Skills Work Together

IMPORTANT: Skills cannot "invoke" other skills programmatically. Skills are documentation that guides agent behavior, not callable subroutines.

Correct pattern:

Load skills/rule-reviewer/SKILL.md to understand the review workflow
Load skills/rule-reviewer/rubrics/*.md as needed for each dimension
Execute the review workflow for each rule file
Write review to reviews/rule-reviews/ following rule-reviewer's output format
Continue to next rule

Why this matters:

The rule-reviewer skill documents a proven, high-quality review process
Following its workflow ensures consistency and completeness
Progressive disclosure (loading rubrics as needed) manages context efficiently
Each rule deserves full evaluation per the documented process

Protocol Violations (FORBIDDEN)

Agents commonly attempt these shortcuts. ALL ARE FORBIDDEN:

Skipping rubric consultation - Scoring without reading dimension rubrics
Batch optimization - Aggregating multiple rules into single review
Parallel shortcuts - Running concurrently unless max_parallel set
Token-saving shortcuts - Generating abbreviated reviews
Time-saving shortcuts - Estimating scores without proper analysis
Template-based reviews - Using examples/ as templates without actual analysis
Skipping schema validation - Not running ai-rules validate

Required Actions

Load rule-reviewer/SKILL.md to understand complete workflow
Load rubrics/*.md files as needed for each dimension being scored
Run ai-rules validate for each rule
Perform Agent Execution Test (count blocking issues)
Score all dimensions according to review_mode (FULL/FOCUSED/STALENESS)
Generate specific recommendations with line numbers
Write complete review to reviews/rule-reviews/ with proper formatting
Follow workflows sequentially (discovery → review-execution → aggregation → summary-report)
Show progress every 10 reviews (not more frequently)

Progress Display Protocol

CRITICAL: Minimize console output during execution.

Canary checks, dimension scoring, and evidence gathering are INTERNAL PROCESSING. Do NOT display intermediate analysis to the user.

Starting/Complete examples:

[45/<total_rules>] Starting: 310-zsh-scripting-core.md
[45/<total_rules>] Complete: 310-zsh-scripting-core.md → 95.5/100
Every 10 rules: brief aggregate summary (e.g., "Progress: 50/<total_rules>, avg 87.2")

DO NOT DISPLAY (internal processing only):

Pre-Rule Canary check questions/answers
Post-Read Canary verification details
Agent Execution Test blocking issue scans
Dimension score calculations and breakdowns
Evidence citations during processing
Rubric consultation notes
"3 unique things" verification output

Rationale: Canary checks ensure quality but are self-verification, not user communication. The review FILE contains all evidence and analysis. Console output should show only progress.

If canary check FAILS: Log failure reason briefly, then proceed with reset protocol.

Execution Acknowledgment

Before proceeding, agent must confirm:

If cannot commit to ALL items: STOP and report error.

Why This Process Cannot Be Shortened

CRITICAL UNDERSTANDING:

This skill is designed for quality assurance, not efficiency. Short-circuiting defeats its purpose.

Agent Behavior Pattern Observed: Agents frequently attempt to "optimize" by:

Creating template-based reviews without analysis
Asking about time constraints mid-execution
Proposing "streamlined" approaches
Calculating projected completion times
Expressing concern about token costs

ALL OF THESE ARE FORBIDDEN.

Why Agents Do This:

Pattern matching to "helpful" behavior (offering options)
Misapplying token-efficiency principles from rules to skills
Anthropomorphizing user patience/time constraints
Attempting to be "considerate" of resources

Why This Is Wrong:

User has already authorized: "ACT" means proceed with full scope
Scope is acceptable: Comprehensive review of all rules is expected
Quality cannot be compromised: One bad rule costs more than comprehensive review
Skills ≠ Rules: Token efficiency principles don't apply here

Correct Behavior:

Execute the full workflow for each rule
Do NOT calculate or mention projected completion times
Do NOT ask about time constraints
Do NOT propose "faster" alternatives
Trust that the user understands the scope

Detection Test: If you're thinking "Should I ask the user about..." - the answer is NO. The user authorized ACT. Proceed with the work.

Skills vs. Rules - Different Optimization Goals:

Rules (e.g., 100-snowflake-core.md):

Usage Frequency: Loaded repeatedly (100s-1000s of times)
Token Efficiency: CRITICAL (repeated cost multiplier)
Optimization Goal: Minimize tokens while preserving quality
Acceptable Size: Minimize (5K-8K tokens ideal)
Design Priority: Token budget discipline

Skills (e.g., bulk-rule-reviewer):

Usage Frequency: Used occasionally (quarterly/monthly)
Token Efficiency: IRRELEVANT (one-time QA cost)
Optimization Goal: Maximize quality regardless of tokens
Acceptable Size: Whatever it takes (50K-100K acceptable)
Design Priority: Comprehensive coverage

Why This Matters:

You may have noticed rule files emphasize token efficiency (TokenBudget metadata, optimization guidelines). That's correct for rules because they're loaded frequently by many agents.

Skills are different. They're used for specialized tasks a few times per year. Token efficiency is NOT a design goal for skills.

Example:

Rule (100-snowflake-core.md): Loaded 100+ times per month → 5K tokens × 100 = 500K monthly cost → Token efficiency matters
Skill (bulk-rule-reviewer): Used 4 times per year → 50K tokens × 4 = 200K annual cost → Token efficiency irrelevant

For this skill specifically:

Cost: ~50K tokens × 4 reviews/year = 200K tokens/year ≈ $1.80 annually
Value: Comprehensive quality assurance for the full current rule set
ROI: One prevented bad rule saves 10-100× the token cost

DO NOT apply rule token-efficiency principles to skill execution.

Common Efficiency Instincts (ALL WRONG):

"I can create streamlined reviews to save time"
- Reality: Streamlined reviews miss critical issues
- Impact: False confidence in rule quality, undetected blocking issues
- Consequence: Agents fail in production with "streamlined" rules
"Template-based reviews are consistent"
- Reality: Templates skip actual analysis
- Impact: Miss rule-specific issues, score drift, no improvement signal
- Consequence: Repository degrades over time with passing scores
"Batch processing multiple rules is efficient"
- Reality: Aggregation loses per-rule detail
- Impact: Cannot track individual rule improvements
- Consequence: Actionable recommendations impossible
"This will take too long"
- Reality: Comprehensive review completes efficiently with measured execution
- Impact: Premature optimization based on incorrect estimate
- Consequence: Unnecessary shortcuts for non-existent problem
"Token costs are too high"
- Reality: 50K tokens ≈ $0.45 for repository-wide quality audit (quarterly = $1.80/year)
- Impact: False economy—one bad rule costs more in debugging
- Consequence: Penny-wise, pound-foolish optimization
- Category Error: Applying rule token-efficiency principles to skill execution (wrong context)

The Real Cost:

Short-circuited review: 5 minutes, $0.05, ZERO quality signal
Comprehensive review: Produces actionable improvements across the current repository
Debugging one bad rule in production: 2+ hours, frustrated users

Annual Economic Reality:

Annual skill usage: 4 bulk reviews
Cost per review: $0.45 (50K tokens)
Annual cost: $1.80
One bad rule in production: Debug time 2-4 hours, token cost 50K-100K ($0.45-$0.90), opportunity cost from delayed features
Cost to prevent: $0.45 per review
ROI: 10-100× return

Token Efficiency Category Error:

Skills are NOT rules. Do not apply rule optimization principles here:

"Skills should be token-efficient like rules" → WRONG context
"Skills should be comprehensive regardless of tokens" → CORRECT context

When Shortcuts Are Acceptable: NEVER. If time/tokens are constraints, use these instead:

Set filter_pattern to review subset (e.g., rules/100-*.md)
Set review_mode: STALENESS for quick check (1 dimension)
Split into multiple sessions with skip_existing: true

DO NOT create a "fast mode" that compromises quality.

Time Expectations (Measured - DO NOT RECALCULATE):

Actual Performance (2026-01-06 run):

Current repository rule count reviewed comprehensively
Execution completes efficiently

DO NOT:

Recalculate these numbers during execution
Project "time remaining"
Express concern about duration
Ask if this is acceptable

These numbers are provided for INFORMATION only. Your job is to execute, not to optimize or question timing.

If you find yourself calculating time:

STOP
You're optimizing instead of executing
Return to the workflow

Verification

Each review must contain:

Executive Summary with scores table (all 6 dimensions for FULL mode)
Schema Validation Results (from ai-rules validate output)
Agent Executability Verdict (based on Agent Execution Test)
Dimension Analysis sections (detailed scoring rationale)
Critical Issues list (specific line numbers)
Recommendations (prioritized with expected score improvements)
Post-Review Checklist
Conclusion

CRITICAL: Evidence-Based Verification (MANDATORY)

Every review MUST include evidence proving the file was actually read:

Requirement	Minimum	Example
Line references	≥15 distinct	"line 47", "lines 120-135"
Direct quotes	≥3 with line numbers	`Line 156: "Use pd.to_datetime() with explicit format"`
Metadata citation	TokenBudget value	"TokenBudget ~6550 declared at line 12"
Pattern names	≥2 exact names	"Anti-Pattern 2: Using Deprecated datetime.utcnow()"
Code references	≥1 function/class name	"The `ensure_python_datetime()` helper at lines 330-350"

Zero-Recommendation Rule:

Reviews with "No recommendations" or "None required" MUST justify with:

At least 3 specific attempts to find issues (with line references)
Explicit statement: "Searched for [X, Y, Z] issues at lines [A, B, C] - none found"

If a review contains zero recommendations AND zero line references = AUTOMATIC REJECTION

Even excellent rules (95-100 score) should have at least LOW severity suggestions:

Minor terminology consistency opportunities
Potential future-proofing considerations
Cross-reference enhancement possibilities
Example expansion opportunities

Violation consequences:

Invalid reviews rejected from summary
Execution halted with protocol violation error
User notified of shortcut attempt

Quality gate:

Review file size 3000-8000 bytes (typical for FULL mode)
< 2000 bytes = too abbreviated (VIOLATION)
All required sections present (VIOLATION if missing)
≥15 line references present (VIOLATION if missing)
≥3 direct quotes with line numbers (VIOLATION if missing)

Shortcut Detection and Prevention

During execution, HALT immediately if agent exhibits:

RED FLAGS (Stop and Self-Correct):

Suggesting efficiency improvements mid-review
- Example: "To save time, I'll create streamlined reviews..."
- Action: STOP. Re-read this skill. Return to comprehensive process.
Creating reviews <2500 bytes (FULL mode)
- Detection: wc -c review_file shows <2500
- Action: DELETE abbreviated review. Re-do with full analysis.
Not loading rubrics before scoring
- Detection: Dimension score provided without read_file(rubrics/[dimension].md)
- Action: HALT. Load rubric. Re-score with rubric guidance.
Batch processing mentions
- Example: "I'll process rules 1-10 together..."
- Action: STOP. Review one rule at a time. No exceptions.
Skipping schema validation
- Detection: Review lacks "Schema Validation Results" section with actual output
- Action: HALT. Run ai-rules validate. Include output.
Using examples/ as templates
- Detection: Review text matches example patterns without rule-specific details
- Action: DELETE review. Re-analyze actual rule content.
Time/token concerns mentioned
- Example: "Given the large number of rules..."
- Action: STOP. Re-read "Why This Process Cannot Be Shortened" section.
Applying rule optimization to skills
- Example: "To improve token efficiency..." (category error)
- Action: STOP. Skills are NOT rules. Different optimization goals.

HALT IMMEDIATELY if you:

Generate review text without reading the actual rule file
Use the same review text for multiple rules
Create reviews in a loop without individual analysis
Think "I'll create a template and fill in details"
Calculate "time remaining" or "rules per minute"
Consider asking the user about scope/timing

Self-Correction Trigger Words: If you think or write any of these phrases, you're taking a shortcut:

"To save time..."
"For efficiency..."
"Given the large number..."
"This would take approximately..."
"Should I continue with..."
"Would you prefer..."
"Let me create a streamlined..."

Immediate Action:

DELETE any work done with shortcuts
Re-read the Anti-Optimization Protocol
Resume comprehensive process
Do NOT notify user of the self-correction (just fix it)

Self-Correction Protocol:

If shortcut detected:

ACKNOWLEDGE: "I was attempting [shortcut]. This violates skill protocol."
DELETE: Remove any abbreviated/templated reviews created
RESET: Re-read bulk-rule-reviewer/SKILL.md and rule-reviewer/SKILL.md
RESUME: Return to comprehensive process at last valid checkpoint
COMMIT: "I will complete comprehensive reviews without shortcuts."

User Intervention:

If agent repeatedly attempts shortcuts despite self-correction:

User should STOP execution
Report skill design flaw
Do NOT proceed with compromised reviews

Workflow

Parameter Collection

Collect ALL parameters (required AND optional) using ask_user_question tool.

See: workflows/parameter-collection.md

MANDATORY: Prompt for ALL parameters in batched questions (max 4 per call):

Do NOT silently apply defaults for optional parameters
User must explicitly confirm each setting
If ask_user_question unavailable, fall back to text-based prompting

[OPTIONAL] Timing Start

When: Only if timing_enabled: true in inputs
MODE: Safe in PLAN mode

See: ../skill-timing/workflows/timing-start.md

Action: Capture run_id in working memory for later use.

Note: Timing tracks the entire bulk review process (all stages), not individual rule-reviewer calls.

[OPTIONAL] Checkpoint: skill_loaded

When: Only if timing was started
Checkpoint name: skill_loaded

See: ../skill-timing/workflows/timing-checkpoint.md

Stage 1: Discovery

Find all .md files in rules/ directory, apply filter_pattern, sort alphabetically.

See: workflows/discovery.md

[OPTIONAL] Checkpoint: discovery_complete

When: Only if timing was started
Checkpoint name: discovery_complete

See: ../skill-timing/workflows/timing-checkpoint.md

Stage 2: Review Execution (Parallel or Sequential)

Execution mode depends on max_parallel parameter:

Parallel Execution (default: max_parallel ≥ 2)

When max_parallel >= 2, use parallel sub-agents for faster execution:

Partition rules into N groups (where N = max_parallel)
Launch N sub-agents in background, each assigned a group
Each sub-agent loads rule-reviewer skill and processes its rules independently
Monitor progress via agent_output polling
Aggregate results when all sub-agents complete

Benefits:

5× speedup (~50 minutes instead of 4-6 hours)
Fresh context per sub-agent (eliminates drift)
Isolated failures (one agent failing doesn't stop others)

See: workflows/parallel-execution.md for full implementation See: workflows/subagent-prompt-template.md for sub-agent prompt

File writes: All sub-agents write directly to {output_root}/rule-reviews/. No conflicts because each sub-agent reviews different rules (unique filenames).

Sequential Execution (max_parallel = 1)

When max_parallel = 1, use legacy sequential processing (single agent reviews all rules).

Use sequential when:

Debugging/troubleshooting the skill
Very small rule sets (< 10 rules)
Explicit user preference

CRITICAL: This is where shortcut temptation peaks. Resist it.

Execution Protocol: No Mid-Stream Questions

RULE: Once user types "ACT", do NOT:

Ask about time constraints
Propose alternative approaches
Calculate projected completion times
Express concern about scope
Request clarification on depth/quality trade-offs

The ACT command means:

User understands the scope of a full-repository review
User wants comprehensive reviews (3000-8000 bytes each)
User prioritizes accuracy over speed

If you catch yourself about to ask "Should I..." or "Would you prefer...":

STOP
The answer is: Continue with comprehensive reviews
Return to the workflow

For each rule file:

INTER-RULE GATE (every 5 rules): If rule_number % 5 == 0, execute workflows/inter-rule-gate.md
PRE-RULE CANARY: Execute 3 canary questions from workflows/proactive-canary.md
Extract rule name from path
Check if review exists (if skip_existing=true)
READ the actual rule file into working memory
POST-READ CANARY: Verify you can name 3 specific things unique to THIS rule
LOAD rule-reviewer/SKILL.md if not already loaded
LOAD relevant rubrics for dimensions being scored
RUN ai-rules validate on the rule file
PERFORM Agent Execution Test (count blocking issues)
MID-REVIEW CANARY (after dimension 3): Check rubric loading and reference reuse
SCORE each dimension according to rubric, citing line numbers and quotes
GENERATE specific recommendations with line numbers
VERIFY review authenticity (see workflows/per-rule-verification.md)
- Must have ≥15 line references
- Must cite direct quotes from rule
- Must include rule-specific findings
- FAILURE triggers workflows/reset-trigger.md
WRITE complete review to reviews/rule-reviews/ directory (respects overwrite parameter)
Store (rule_name, score, verdict, review_path)
Show progress every 10 reviews

CRITICAL: Step 5 (READ the actual rule file) MUST happen BEFORE steps 10-13. Reviews generated without reading the file will fail verification at step 14.

DRIFT PREVENTION: Steps 1, 2, 6, 11 are canary/gate checks that detect optimization drift BEFORE it produces compromised output. These are NOT optional.

Time per rule: Efficient execution with comprehensive analysis Quality: Comprehensive, reliable, actionable

ANTI-PATTERN ALERT:

If you're thinking ANY of these thoughts, STOP and re-read this skill:

"This will take too long" → Comprehensive review completes efficiently
"I can optimize this" → NO. Quality > efficiency
"Templates would be faster" → Templates = zero signal
"Token costs are high" → $0.45 for 129 rules is cheap QA (Skills ≠ Rules)
"Users won't notice abbreviated reviews" → They will. Bad rules escape.

Correct Agent Self-Talk During Execution:

WRONG (Optimization-Seeking):

"I've done 10 rules in 8 minutes, so 129 will take 90 minutes..."
"This seems like a lot of work, should I ask the user?"
"Maybe I can create a template to speed this up..."
"Let me calculate the token cost..."

RIGHT (Execution-Focused):

"Next rule: 101a-snowflake-streamlit-visualization.md"
"Reading rule file... Running schema validation... Counting blocking issues..."
"Scoring dimension 1: Actionability... Evidence: line 52..."
"Writing review... Next rule: 101b-snowflake-streamlit-performance.md"

Notice the difference:

WRONG: Meta-thinking about the process
RIGHT: Executing the process

Your internal monologue should be task steps, not process analysis.

Reminder:

ONE bad rule in production costs 100× more than comprehensive review
Skills are NOT rules → Token efficiency is irrelevant (different optimization goals)

See: workflows/review-execution.md for orchestration details, resume capability, error handling.

[OPTIONAL] Checkpoint: reviews_complete

When: Only if timing was started
Checkpoint name: reviews_complete

See: ../skill-timing/workflows/timing-checkpoint.md

Stage 3: Aggregation

For each review file:

Read first 150 lines only (context management)
Extract: overall score, verdict, critical issues, dimension scores
Build lightweight data structure (no full content)
Calculate statistics: average, median, distribution

See: workflows/aggregation.md for parsing strategy and statistics calculations.

[OPTIONAL] Checkpoint: aggregation_complete

When: Only if timing was started
Checkpoint name: aggregation_complete

See: ../skill-timing/workflows/timing-checkpoint.md

Stage 4: Summary Report

Generate master summary with:

Prioritized sections (Priority 1-4)
Rules sorted by score within tiers
Impact × effort ratios for recommendations
Write to reviews/summaries/_bulk-review-<model>-<date>.md

See: workflows/summary-report.md for report format and section generation.

[OPTIONAL] Checkpoint: summary_complete

When: Only if timing was started
Checkpoint name: summary_complete

See: ../skill-timing/workflows/timing-checkpoint.md

[OPTIONAL] Timing End (Compute)

When: Only if timing was started
MODE: Safe in PLAN mode (outputs to STDOUT only)

See: ../skill-timing/workflows/timing-end.md (Step 1)

Action: Capture STDOUT output for metadata embedding.

[MODE TRANSITION: PLAN → ACT]

Request user ACT authorization before file modifications.

[OPTIONAL] Timing End (Embed)

When: Only if timing was started
MODE: Requires ACT mode (appends metadata to file)

See: ../skill-timing/workflows/timing-end.md (Step 2)

Action: Parse STDOUT, append timing metadata section to summary report file.

Critical Design Decisions

Context Management: Parse only first 150 lines of each review (scores/verdicts only). Full details remain in individual files.

Stateless Execution: Review failures don't stop batch. Resume via skip_existing parameter.

See: workflows/aggregation.md for complete strategy.

Error Handling

Review failure: Continue with next file, log error, mark FAILED in summary.

Context overflow: Switch to minimal output mode, report warning, continue.

File write failure: Print OUTPUT_FILE directive for manual save, continue.

Empty rules directory: Report error, exit gracefully without empty summary.

Partial completion: Resume capability allows continuation using existing reviews.

Usage Examples

Basic Invocation (All Rules, Full Review)

Use the bulk-rule-reviewer skill.

review_date: 2026-01-06
review_mode: FULL
model: claude-sonnet-45

Filtered Review (Snowflake Rules Only)

Use the bulk-rule-reviewer skill.

review_date: 2026-01-06
review_mode: FULL
model: claude-sonnet-45
filter_pattern: rules/100-*.md

Force Re-Review (Overwrite Existing)

Use the bulk-rule-reviewer skill.

review_date: 2026-01-06
review_mode: FULL
model: claude-sonnet-45
overwrite: true

Note: Use overwrite: true to replace existing reviews. Use skip_existing: false combined with overwrite: false (default) to create new versions with sequential numbering (-01, -02).

Re-Review with Sequential Numbering (Preserve History)

Use the bulk-rule-reviewer skill.

review_date: 2026-01-06
review_mode: FULL
model: claude-sonnet-45
skip_existing: false
overwrite: false

This creates -01, -02 versions without overwriting previous reviews.

Staleness Check Only

Use the bulk-rule-reviewer skill.

review_date: 2026-01-06
review_mode: STALENESS
model: claude-sonnet-45

Success Criteria

All matching rules reviewed (or filtered subset)
Individual review files written to reviews/rule-reviews/
Master summary report generated with valid path
Prioritized improvement list included
No context overflow during execution
Resume capability functional (existing reviews skipped)
Error handling graceful (failed reviews don't stop batch)

Expected Outcomes

Score distribution: Average, median, distribution by priority tier.

Dimension analysis: Average scores for all 6 dimensions.

Critical issues summary: Count of rules with 0, 1-2, 3+ critical issues.

Prioritized recommendations: Top 10 rules to improve (impact × effort), estimated effort, expected score improvement.

Next steps: Immediate actions, short-term goals, long-term strategy.

Installation Requirements

Dependency: rule-reviewer skill (required)

Skill location resolution supports two patterns:

Installed Skill (Recommended): Install rule-reviewer via agent tool's skill management
Local Skill (Fallback): Ensure skills/rule-reviewer/ exists in project

Auto-detection: Automatically detects which pattern is available.

Error handling: If neither found, execution stops with installation guidance.

Validation

See: workflows/input-validation.md for validation workflow and code patterns.

Key Requirements:

review_date: YYYY-MM-DD format (valid calendar date)
review_mode: FULL | FOCUSED | STALENESS (uppercase)
model: lowercase-hyphenated (e.g., claude-sonnet-45)
filter_pattern: rules/*.md glob (optional, must match ≥1 file)
skip_existing: boolean true/false (optional, default: true)
max_parallel: integer 1-10 (optional, default: 1)
Environment: rules/ exists and readable, reviews/rule-reviews/ and reviews/summaries/ writable

Execution: Validate inputs before Stage 1 (Discovery). Fail fast on errors.

Examples

examples/full-bulk-review.md - Complete walkthrough example

Related Skills

rule-reviewer - Single rule review (required dependency)
rule-creator - Create new rules (complementary)

References

Rules

rules/002h-claude-code-skills.md - Skill authoring best practices
rules/002-rule-governance.md - Rule schema and standards
rules/000-global-core.md - Foundation patterns