ai-agents

Review before merge. Stage-1 spec-compliance gate, then 11 Stage-2 canonical axes (analyst, architect, qa, security, devops, roadmap, reliability, observability, agent-safety, decision-rigor, code-quality) plus 3 chained skills (code-qualities-assessment, golden-principles, taste-lints). Run after /test. Run for a full pre-merge review. Do NOT invoke code-qualities-assessment, golden-principles, or taste-lints directly for a full review; review chains them.

build

Build incrementally. Implement changes in thin vertical slices with TDD and atomic commits. Run after /plan.

plan

Plan how to build it. Decompose specs into milestones with dependencies and risk mitigations. Run after /spec.

ship

Ship it. Pre-flight validation, CI check, and PR creation. Run after /review.

spec

Define what to build. Transform a problem into testable requirements with acceptance criteria.

sync

Detect Spec to Code drift. Scan REQ/DESIGN/TASK specs for references to code that no longer exists, then report drift for review. Run after a hand-edit that moved or deleted code.

test

Prove it works. Multi-dimensional quality validation across functional, non-functional, security, DevOps, DX, and observability. Run after /build.

benchmark-models

Cross-model benchmark. Runs one prompt or skill through Claude, GPT (Codex CLI), and Gemini side by side and compares latency, tokens, cost, tool calls, and optionally output quality via an Anthropic-API judge. Answers "which model is actually best for this skill?" with data. Use when you say "benchmark models", "compare models", "which model is best for X", "cross-model comparison", or "model shootout". Do NOT use to measure web page performance.

decision-critic

Structured decision critic that systematically stress-tests reasoning before commitment surfacing hidden assumptions verifying claims and generating adversarial perspectives to improve decision quality. Do NOT use to surface failure risks pre-launch (use pre-mortem) or to probe why a constraint exists (use chestertons-fence).

reflect

CRITICAL learning capture. Extracts HIGH/MED/LOW confidence patterns from conversations to prevent repeating mistakes and preserve what works. Use PROACTIVELY after user corrections ("no", "wrong"), after praise ("perfect", "exactly"), when discovering edge cases, or when skills are heavily used. Without reflection, valuable learnings are LOST forever. Acts as continuous improvement engine for all skills. Invoke EARLY and OFTEN - every correction is a learning opportunity.

pre-mortem

Guide prospective hindsight analysis to identify project risks before failure occurs. Teams imagine the project has failed spectacularly, then work backward to identify causes. Increases risk identification by 30% compared to traditional planning. Use when you say "run a pre-mortem on", "what could cause this to fail", "identify project risks", or "what could go wrong with". Do NOT use to stress-test a single decision's reasoning (use decision-critic).

programming-advisor

Evaluate existing solutions (libraries, SaaS, open source) AND internal prior-art before custom development to avoid reinventing the wheel. Use when considering building new features, asking "should I build or use existing", "do we already have this", "is there existing code for X in this repo", "is there a library for this", or need build vs buy cost analysis with token estimates. Checks internal reuse (leverage/extend) before external. Do NOT use for strategic multi-option TCO (use buy-vs-build-framework).

merge-resolver

Resolve merge conflicts by analyzing git history and commit intent. Handles PR conflicts, branch conflicts, and session file conflicts with automated resolution for known patterns. Use when you say "resolve merge conflicts", "fix conflicts on this branch", "PR has conflicts with main", "can't merge due to conflicts", or "resolve PR conflicts". Do NOT use for rebasing, cherry-picking, or complex history rewrites (use git-advanced-workflows).

adr-generator

Create comprehensive Architectural Decision Records (ADRs). Researches the destination directory to detect existing template conventions, gathers context, determines next ADR number, generates the ADR, validates completeness, and saves. Supports multiple ADR formats (MADR, Nygard, Alexandrian, project canonical). Use when documenting technical decisions or creating new ADR files. Use when you say "write an ADR", "document this decision". Do NOT use to debate or review an existing ADR (use adr-review).

adr-review

Multi-agent debate orchestration for Architecture Decision Records. Automatically triggers on ADR create/edit/delete. Coordinates architect, critic, independent-thinker, security, analyst, and high-level-advisor agents in structured debate rounds until consensus. Use when you say "review this ADR", or on ADR create/edit. Do NOT use to author a new ADR (use adr-generator).

github

Execute GitHub operations (PRs, issues, milestones, labels, comments, merges) using Python scripts with structured output and error handling. Use when working with pull requests, issues, review comments, CI checks, or milestones instead of raw gh.

orphan-ref-validator

Detect references to skills, scripts, and counts in structured artifacts (specs, ADRs, eval fixtures, plugin manifests, skill descriptions) that do not match working-tree state. Run as a /build Mandatory Exit Gate to block orphan refs pre-commit instead of paying iteration rounds in /pr-quality:all post-PR.

pr-comment-responder

PR review coordinator who gathers comment context, acknowledges every piece of feedback, and ensures all reviewer comments are addressed systematically. Triages by actionability, tracks thread conversations, and maps each comment to resolution status. Use when you say "respond to PR comments", "address review feedback on PR 123", "handle PR review comments", "fix PR review issues", or "reply to reviewer". Do NOT use for a single-comment reply with a known response (use post_pr_comment_reply.py directly) or for a full pre-merge code review (use review).

security-review

Security review knowledge delivered as parent-inline context (the form-factor counterpart to the security agent). Threat-models a code change, scores risk with CWE/CVE evidence, and returns a verdict. Use to review a diff or snippet for vulnerabilities when you want the security knowledge inline rather than dispatched to a subagent. Do NOT use for STRIDE attack-surface analysis of a system or architecture; use threat-modeling instead.

2026-06-19

threat-modeling

Structured security analysis using OWASP Four-Question Framework and STRIDE methodology. Generates threat matrices with risk ratings, mitigations, and prioritization. Use for attack surface analysis, security architecture review, or when asking what can go wrong. Do NOT use for per-change diff or snippet risk review; use security-review instead.

2026-06-19

validation-authority

Treat upstream validators as authoritative. Align local config to them. Use when validation fails unexpectedly, before modifying validator behavior, or when tempted to change upstream tool code.

2026-06-19

memory-search

Tier 1 semantic memory search across Serena and Forgetful with progressive disclosure and token-budget warnings. The focused search operation split out of the memory router per ADR-063. Use when you say `search memory`, `what do we know about X`, or `recall prior context`. Do NOT use to extract session episodes, update the causal graph, or add citations (use memory or memory-enhancement).

2026-06-18

memory

Unified four-tier memory system for AI agents. Tier 1 Semantic (Serena+Forgetful search), Tier 2 Episodic (session replay), Tier 3 Causal (decision patterns). Enables memory-first architecture per ADR-007. Use when you ask "what do we know about X", "recall prior context", "search memory". Do NOT use for adding citations to existing memories (use memory-enhancement) or for narrative cross-system reports (use memory-documentary).

2026-06-18

session-end

Validate and complete session logs before commit. Auto-populates session end evidence (commit SHA, lint results, memory updates) and runs validation. Use when finishing a session, before committing, or when session validation fails.

2026-06-17

session-init

Create protocol-compliant JSON session logs with verification-based enforcement. Autonomous operation with auto-incremented session numbers and objective derivation from git state. Use when starting any new session. Use when you say "start a session", "create the session log". Do NOT use for mid-session protocol checks (use session).

2026-06-17

chestertons-fence

Investigate historical context of existing code, patterns, or constraints before proposing changes. Automates git archaeology, PR/ADR search, and dependency analysis to prevent removing structures without understanding their purpose. Use when you ask "why does this code/constraint exist", "is it safe to remove this". Do NOT use for forward-risk analysis (use pre-mortem).

2026-06-13

slashcommandcreator

Autonomous meta-skill for creating high-quality custom slash commands using 5-phase workflow with multi-agent validation and quality gates. Use when user requests new slash command, reusable prompt automation, or wants to convert repetitive workflows into documented commands.

2026-06-13

context-optimizer

Analyze skill content for optimal placement (Skill vs Passive Context vs Hybrid), compress markdown to pipe-delimited format (60-80% token reduction), and validate compliance against the decision framework. Based on Vercel research showing passive context achieves 100% pass rates vs 53-79% for skills. Use when you ask "compress this skill", "Skill vs Passive Context placement", "reduce tokens". Do NOT use for gathering knowledge before a task (use context-gather).

2026-06-10

business-strategy

محللو الإدارة

Route a founder problem to the right business framework, then load that framework on demand. Distills 14 business books (customer discovery, positioning, pricing, sales, growth, persuasion) into scored, decision-tree skills. Use when you say diagnose my business problem, what framework applies, how do I validate demand, price this, position this, generate leads, or close deals. Do NOT use for software design (use the engineering rules) or for a single named decision (use decision-critic).

2026-06-09

retrospective

Extract learnings from a session or task through structured retrospective frameworks. Gathers evidence, runs Five Whys and fishbone diagnosis, scores atomicity, and writes a canonical retrospective artifact. Use to turn execution experience into institutional knowledge. Do NOT use for in-conversation correction capture (use the reflect skill).

2026-06-07

pr-autofix

Autonomous PR monitor and fixer per docs/autonomous-pr-monitor.md. Triages open PRs by tier, addresses thread feedback, fixes CI failures, and enables auto-merge when the 4-condition Ready-to-Merge gate passes.

2026-06-07

requirements-interview

Adversarial requirements interview that walks the design tree to elicit testable requirements before any code is written. Implements the grill-me pattern - ask relentlessly, recommend an answer for every question, and resolve dependencies between decisions one branch at a time. Skip any question the codebase can already answer.

2026-06-05

requirements-interview

Execute CodeQL security scans with language detection, database caching, and SARIF output. Use when performing static security analysis on Python or GitHub Actions code.

2026-06-05

security-scan

محللو أمن المعلومات

Detect CWE-78 (command injection) regex patterns in Python, PowerShell, Bash, and C# files before PR submission. CWE-22 is delegated to CodeQL; see Scope. Use when you ask "scan for command injection", "CWE-78 check before PR". Do NOT use to decide whether security review is warranted (use security-detection).

2026-06-05

context-gather

Gather comprehensive context from Forgetful Memory, Context7 docs, DeepWiki, and web sources before planning or implementation. Follows the exploring-knowledge-graph skill to search across all knowledge tiers and returns a focused summary with a parseable CONTEXT_LOADED marker for downstream skip detection. Use when you say "gather context before planning", "what do we know before I start". Do NOT use for compressing or placing skill text (use context-optimizer).

checkpoint

Write a timestamped mid-session checkpoint snapshot of decisions, progress, and next actions to .agents/checkpoints/, then link it from the active session log.

spec-generator

Transform feature descriptions into 3-tier specifications (Requirements, Design, Tasks) using EARS syntax, with schema-validated frontmatter on every emitted file. Reads the canonical spec schema before writing and rejects any out-of-range enum value.

متخصصو العمليات التجارية، جميع الآخرون

chaos-experiment

Design and document chaos engineering experiments. Guide steady state baseline, hypothesis formation, failure injection plans, and results analysis. Use when you say "design a chaos experiment", "plan a game day", "failure injection", "test resilience", or "chaos engineering". Do NOT use for security threat analysis (use threat-modeling) or pre-launch project risk identification (use pre-mortem).

retro

Fill an unfilled auto-retro skeleton for a date by running the retrospective skill