name	reviewing-agent-effectiveness
description	Diagnose which agent enforcement mechanisms (rules, hooks, skills) fired during a task and which did not. Produces a FIRED / NOT FIRED / N/A report per mechanism. Use when the user asks to review what worked, check enforcement, audit the agent system, or evaluate whether rules gripped. For fixing gaps found, hand off to reviewing-agent-infrastructure.

Reviewing Agent Effectiveness

Context

The FitnessApp project uses a layered defense-in-depth architecture:

L2 — Always-Apply Rules (.claude/rules/*.mdc with alwaysApply: true): Loaded into every chat. ~80% compliance.
L3 — Skills (.claude/skills/*/SKILL.md): Triggered by keyword match from user prompt. ~85-90% compliance.
L3 — Commands (.claude/commands/*.md): Explicit user-triggered workflows. ~85-90% compliance.
L5 — Stop Hook (.claude/hooks/post-task-check.sh): Deterministic checks after agent finishes. 100% execution.
L4 — Pre-Commit Hook (.git/hooks/pre-commit): Blocks bad commits. 100% execution.

When to Use

After any non-trivial implementation task, especially:

New features (2+ Swift files)
Refactoring across multiple files
Bug fixes touching coordinators or shared state
When the user asks "was hat gegriffen?" / "what rules fired?" / "enforcement audit"
Handoff target from debugging-ui-tests/SKILL.md when a UI-test failure exposes a gap in rules/skills/hooks that should have caught the issue earlier

Audit Process

Step 1: Identify the Task Scope

Determine what was done in the task being audited:

What files were created/modified?
What type of task was it? (new feature, refactoring, bug fix, test writing)
Was it in the current chat or a different chat? (if different, read the transcript)

Step 2: Check Each Enforcement Mechanism

Walk through every mechanism and determine: FIRED / NOT FIRED / NOT APPLICABLE.

A: Always-Apply Rules

ID	Rule	File	What to Check
R1	Triggered Rule Notice	`code-changes-enforcement.mdc`	Did the agent mention "post-change validation will be required" early?
R2	DEVELOPER_DIR	`build-and-test.mdc`	Were all `xcodebuild` calls prefixed with `DEVELOPER_DIR`? Was `swift test`/`swift build` avoided?
R3	Architecture Sync	`architecture-documentation-sync.mdc`	Was `architecture-documentation.md` updated when structural changes occurred? See `reviewing-code-changes` skill for trigger map.
R4	Co-Author Trailer	AGENTS.md rule	If a commit was made, does it include a `Co-authored-by: Claude` trailer?
R5	Agent Infra Enforcement	`agent-infrastructure-enforcement.mdc`	If .claude/ files changed, was agent-infrastructure validation mentioned early? Did the agent suggest learnings after mistakes?

C: Skills

ID	Skill	File	Trigger Condition	What to Check
S1	Create Feature	`create-feature/SKILL.md`	User asks to create new feature/screen/view	Was the skill read and followed? Were tests written?
S2	Reviewing Code Changes	`reviewing-code-changes/SKILL.md`	Swift files changed OR user asks to review code	Was the full checklist followed? Was stamp written (if post-change)?
S3	Reviewing Test Quality	`reviewing-test-quality/SKILL.md`	User asks to review tests	Was the checklist followed?
S5	Writing UI Tests	`writing-ui-tests/SKILL.md`	User asks to write UI tests	Was the skill read?
S6	Updating UI Tests	`updating-ui-tests/SKILL.md`	User asks to fix/update UI tests	Was the skill read?

D: Hooks

ID	Hook	File	What to Check
H1	Check 1: Code Validation (Grind Loop)	`code-validation.sh`	Was `code-changes.stamp.md` present and fresh (< 10 min)? Did the grind loop fire?
H2	Check 2: Docs-Sync (Stateless)	`architecture-sync.sh`	Were structural changes detected? Was `architecture-documentation.md` updated?
H3	Check 3: Test Execution (Grind Loop)	`test-execution.sh`	Were test files changed? Was `test-execution.stamp.md` present? Did the grind loop fire?
H4	Check 4: Test Coverage (Hint)	`test-coverage.sh`	Were new ViewModel/Service files created? Do corresponding test files exist?
H5	Check 5: Enforcement Audit (Hint)	`enforcement-audit.sh`	5+ Swift files changed — was enforcement audit suggested?
H6	Check 6: Agent Infrastructure (Grind Loop)	`agent-infrastructure.sh`	.claude/ files changed — was `agent-infrastructure.stamp.md` present? Did the grind loop fire?
H7	Pre-Commit: Validation	`.git/hooks/pre-commit`	Would a commit be blocked for missing validation?
H8	Pre-Commit: No print()	`.git/hooks/pre-commit`	Are there `print()` statements in production code?
H9	Pre-Commit: architecture-documentation.md	`.git/hooks/pre-commit`	Would a commit be blocked for stale architecture-documentation.md?

Step 3: Build the Report

For each mechanism, assign a status and record evidence:

| ID | Mechanism | Status | Evidence | Triggered By |
|---|---|---|---|---|
| R1 | Triggered Notice | FIRED | "post-change validation will be required" in first response | alwaysApply: true |
| R3 | AppStyle Tokens | FIRED | 17 new tokens in AppStyle.swift, 0 hardcoded values | alwaysApply: true |
| H4 | Tests Exist | NOT FIRED | New ProfileViewModel.swift without ProfileViewModelTests.swift | Design gap |
| S3 | Test Quality Review | N/A | User did not request test review | Correct |

Step 4: Visualize the Trigger Chain

Create a mermaid graph showing which mechanisms triggered which actions, and where gaps occurred. Use solid arrows for successful triggers, dashed arrows for failures.

Step 5: Gap Analysis

For each NOT FIRED finding:

Is it a gap or correct behavior? (N/A = correct, NOT FIRED = potential gap)
Severity: should this be fixed? (critical = blocks quality, minor = nice-to-have, skip = correct behavior)

Step 5.1: Hand Off to reviewing-agent-infrastructure

For each gap with status NOT FIRED that represents a real deficiency (not N/A): run the reviewing-agent-infrastructure skill to persist the fix as a rule or skill update. Do not implement fixes inline in the audit report — the audit is a diagnosis, not a treatment.

Step 6: Summary

## Enforcement Audit Summary
- Task: <description>
- Files changed: <count>
- Mechanisms checked: <total>
- FIRED: <count> (<percentage>)
- NOT FIRED: <count> (gaps)
- N/A: <count> (correct)
- Gaps to fix: <list with IDs>

Output

The complete report should be presented to the user. Do NOT write it to a file unless asked — it is a conversation artifact, not a persistent stamp.

name	reviewing-agent-effectiveness
description	Diagnose which agent enforcement mechanisms (rules, hooks, skills) fired during a task and which did not. Produces a FIRED / NOT FIRED / N/A report per mechanism. Use when the user asks to review what worked, check enforcement, audit the agent system, or evaluate whether rules gripped. For fixing gaps found, hand off to reviewing-agent-infrastructure.