Use when sessions feel sluggish or context is running low. Audits token consumption across agents, skills, MCP servers, and rules, then produces prioritized token-savings recommendations.
Use when comparing coding agents (Claude Code, Aider, Codex, Copilot) head-to-head on custom tasks. Measures pass rate, cost, time, and consistency with reproducible benchmarks.
Use when designing or improving an AI agent's action space, tool definitions, error recovery, or observation formatting to raise completion rates.
Use when an agent is looping, burning tokens without progress, or failing repeatedly. Provides structured self-debugging via capture, diagnosis, contained recovery, and introspection reports.
Use when trimming MCC to what a repo actually needs. Sorts skills, rules, hooks, and commands into DAILY vs LIBRARY buckets using evidence from the codebase.
Use when running AI-agent-driven engineering workflows. Covers eval-first execution, task decomposition, cost-aware model routing, and human-in-the-loop quality gates.
Use when designing process, reviews, and architecture for AI-first teams where agents generate most implementation output. Covers eval coverage, review focus, and planning quality.
Use when AI agents modified backend logic and you need regression tests. Covers sandbox-mode API testing, automated bug-check workflows, and patterns to catch AI blind spots.