with one click
harness-setup
// [Quality] Use when setting up an agent quality harness with feedforward guides and feedback sensors.
// [Quality] Use when setting up an agent quality harness with feedforward guides and feedback sensors.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | harness-setup |
| description | [Quality] Use when setting up an agent quality harness with feedforward guides and feedback sensors. |
Codex compatibility note:
- Invoke repository skills with
$skill-namein Codex; this mirrored copy rewrites legacy Claude/skill-namereferences.- Prefer the
plan-hardskill for planning guidance in this Codex mirror.- Task tracker mandate: BEFORE executing any workflow or skill step, create/update task tracking for all steps and keep it synchronized as progress changes.
- User-question prompts mean to ask the user directly in Codex.
- Ignore Claude-specific mode-switch instructions when they appear.
- Strict execution contract: when a user explicitly invokes a skill, execute that skill protocol as written.
- Subagent authorization: when a skill is user-invoked or AI-detected and its protocol requires subagents, that skill activation authorizes use of the required
spawn_agentsubagent(s) for that task.- Do not skip, reorder, or merge protocol steps unless the user explicitly approves the deviation first.
- For workflow skills, execute each listed child-skill step explicitly and report step-by-step evidence.
- If a required step/tool cannot run in this environment, stop and ask the user before adapting.
Codex does not receive Claude hook-based doc injection. When coding, planning, debugging, testing, or reviewing, open project docs explicitly using this routing.
Always read:
docs/project-config.json (project-specific paths, commands, modules, and workflow/test settings)docs/project-reference/docs-index-reference.md (routes to the full docs/project-reference/* catalog)docs/project-reference/lessons.md (always-on guardrails and anti-patterns)Situation-based docs:
backend-patterns-reference.md, domain-entities-reference.md, project-structure-reference.mdfrontend-patterns-reference.md, scss-styling-guide.md, design-system/README.mdfeature-docs-reference.mdintegration-test-reference.mde2e-test-reference.mdcode-review-rules.md plus domain docs above based on changed filesDo not read all docs blindly. Start from docs-index-reference.md, then open only relevant files for the task.
[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set
in_progresswhen step starts, setcompletedwhen step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.
Goal: Set up the complete outer agent harness for a greenfield project so all subsequent AI coding agents operate with maximum guidance and earliest-possible quality feedback.
What this produces:
$linter-setup (linters, formatters, pre-commit hooks, CI gates).ai/workspace/harness/harness-inventory.mdWhen invoked: After $scaffold and $linter-setup in the greenfield workflow. Assumes scaffolding is complete.
What it does NOT do: Install linters or configure formatters — that is $linter-setup's responsibility.
Check 1 — Linter-setup prerequisite (BLOCK if missing):
Before running any phases, verify $linter-setup has completed by checking for:
.eslintrc, pyproject.toml, .editorconfig).husky/, .pre-commit-config.yaml)If any of these are missing → a direct user question: "$linter-setup appears incomplete. Computational feedback sensors must be in place before harness setup. Run $linter-setup first, then return here?" BLOCK Phase A/B/C/D/E until linter-setup verification passes.
Check 2 — Existing harness inventory:
Check for .ai/workspace/harness/harness-inventory.md
CLAUDE.md/AGENTS.md presence — those are feedforward guides this skill may enhance, not signals to skipRead from: plan.md frontmatter → architecture-design report → tech-stack-comparison report.
Extract:
Write detection result to .ai/workspace/harness/stack-profile.md.
If any field undetectable → a direct user question to confirm before proceeding.
For each guide type, check if it exists; if not, create or enhance:
1. CLAUDE.md / AGENTS.md — Architecture conventions
$architecture-design (e.g., Clean Architecture, CQRS, Repository)2. Skill activation rules
$review-domain-entities"$code-review"3. Architecture notes
docs/architecture/ with:
bounded-contexts.md — domain boundaries and ownershipdependency-rules.md — allowed import directions between layersnaming-conventions.md — project-specific naming for files, classes, functions4. Pattern catalog
docs/architecture/pattern-catalog.md$architecture-design with DO/DON'T examplesPresent list of guides created/updated via a direct user question: "Feedforward guides above will be created/enhanced. Confirm or adjust?"
Confirm $linter-setup has completed:
.eslintrc, pyproject.toml, .editorconfig).husky/, .pre-commit-config.yaml)If any missing → invoke $linter-setup before continuing.
Output: confirmation that computational sensors are in place, with file paths listed.
Configure which AI review skills fire at each lifecycle stage. Present to user via a direct user question: "Which inferential sensors should be mandatory vs optional for this project?"
Pre-implementation (planning gate):
$why-review — validate design rationale before committing to implementation approachPre-commit (lightweight review):
$code-review before committing significant changesPost-implementation (domain model changes):
$review-domain-entities — when domain entity files are in the changesetPre-release (mandatory gates):
$sre-review — reliability and operational readiness$security — security review before production releaseRecurring drift detection:
$scan-codebase-health — schedule quarterly (or on CI schedule) to detect driftAdd the agreed sensor configuration to CLAUDE.md under "## Review Gates".
Define the project's behaviour harness plan:
Functional spec format:
docs/business-features/ or equivalent spec homeTest strategy pyramid:
Approved fixtures pattern:
Coverage threshold:
Document agreed test strategy to docs/architecture/test-strategy.md.
Write .ai/workspace/harness/harness-inventory.md:
# Harness Inventory
Generated: {date}
Stack: {detected stack from Phase A}
## Feedforward Guides
| Type | File/Skill | Purpose |
| ------------- | ------------------------------------ | ------------------------------- |
| Inferential | CLAUDE.md §Architecture Patterns | Shapes AI architectural choices |
| Inferential | CLAUDE.md §Anti-Patterns | Prevents known bad patterns |
| Inferential | docs/architecture/pattern-catalog.md | DO/DON'T examples per pattern |
| Computational | .editorconfig | Cross-IDE consistency |
## Feedback Sensors — Computational
| Stage | Tool/Hook | What it catches |
| ---------- | ----------------- | ------------------------------- |
| Pre-commit | {linter} | Style violations, common errors |
| Pre-commit | {formatter} | Code formatting drift |
| CI | {type-checker} | Type errors |
| CI | {static-analyzer} | Security, complexity, dead code |
## Feedback Sensors — Inferential
| Stage | Skill/Agent | What it catches |
| ------------------- | ----------------------- | ------------------------------ |
| Pre-implementation | $why-review | Design rationale gaps |
| Pre-commit | $code-review | Convention drift, logic errors |
| Post-implementation | $review-domain-entities | Domain model quality |
| Pre-release | $sre-review | Operational readiness |
| Pre-release | $security | Security vulnerabilities |
## Open Gaps
| Area | Reason | Risk |
| ------------------------ | -------- | -------------- |
| {area not yet harnessed} | {reason} | {LOW/MED/HIGH} |
Present inventory to user for review via a direct user question.
a direct user question:
[IMPORTANT] Use task tracking to break ALL work into small tasks BEFORE starting — including tasks for each file read. This prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip.
Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.
AI Mistake Prevention — Failure modes to avoid on every task:
Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path. When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site. Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path.
Harness Engineering — An outer agent harness has two jobs: raise first-attempt quality + provide self-correction feedback loops before human review.
Controls split:
Axis Type Examples Frequency Feedforward Computational .editorconfig, strict compiler flags, enforced module boundariesAlways-on Feedforward Inferential CLAUDE.mdconventions, skill prompts, architecture notes, pattern catalogsAlways-on Feedback Computational Linters, type checks, pre-commit hooks, ArchUnit/arch-fitness tests, CI gates Pre-commit → CI Feedback Inferential $code-reviewskill,$sre-review,$security, LLM-as-judge passesPost-commit → CI Three harness types:
- Maintainability — Complexity, duplication, coverage, style. Easiest: rich deterministic tooling.
- Architecture fitness — Module boundaries, dependency direction, performance budgets, observability conventions.
- Behaviour — Functional correctness. Hardest: requires approved fixtures or strong spec-first discipline.
Keep quality left: pre-commit sensors fire first (cheap), CI sensors fire second, post-review last (expensive).
Research-driven: Never hardcode tool choices. Detect tech stack → research ecosystem → present top 2-3 options → user decides. Enforce strictest defaults; loosen only with explicit approval.
Harnessability signals: Strong typing, explicit module boundaries, opinionated frameworks = easier to harness. Treat these as greenfield architectural choices, not just style preferences.
IMPORTANT MUST ATTENTION follow declared step order for this skill; NEVER skip, reorder, or merge steps without explicit user approval
IMPORTANT MUST ATTENTION for every step/sub-skill call: set in_progress before execution, set completed after execution
IMPORTANT MUST ATTENTION every skipped step MUST include explicit reason; every completed step MUST include concise evidence
IMPORTANT MUST ATTENTION if Task tools unavailable, maintain an equivalent step-by-step plan tracker with synchronized statuses
MUST ATTENTION never auto-decide feedforward guide content — present draft and confirm with a direct user question
MUST ATTENTION verify $linter-setup completed before Phase C passes
MUST ATTENTION write harness-inventory.md incrementally (append after each phase) — never hold in memory
MUST ATTENTION harness is a living document — update inventory when new sensors are added later
[TASK-PLANNING] Before acting, analyze task scope and break it into small todo tasks using task tracking.
Source: .claude/hooks/lib/prompt-injections.cjs + .claude/.ck.json
$workflow-start <workflowId> for standard; sequence custom steps manually[CRITICAL] Hard-won project debugging/architecture rules. MUST ATTENTION apply BEFORE forming hypothesis or writing code.
Goal: Prevent recurrence of known failure patterns — debugging, architecture, naming, AI orchestration, environment.
Top Rules (apply always):
ExecuteInjectScopedAsync for parallel async + repo/UoW — NEVER ExecuteUowTaskwhere python/where py) — NEVER assume python/python3 resolvesExecuteInjectScopedAsync, NEVER ExecuteUowTask. ExecuteUowTask creates new UoW but reuses outer DI scope (same DbContext) — parallel iterations sharing non-thread-safe DbContext silently corrupt data. ExecuteInjectScopedAsync creates new UoW + new DI scope (fresh repo per iteration).AccountUserEntityEventBusMessage = Accounts owns). Core services (Accounts, Communication) are leaders. Feature services (Growth, Talents) sending to core MUST use {CoreServiceName}...RequestBusMessage — never define own event for core to consume.HrManagerOrHrOrPayrollHrOperationsPolicy names set members, not what it guards. Add role → rename = broken abstraction. Rule: names express DOES/GUARDS, not CONTAINS. Test: adding/removing member forces rename? YES = content-driven = bad → rename to purpose (e.g., HrOperationsAccessPolicy). Nuance: "Or" fine in behavioral idioms (FirstOrDefault, SuccessOrThrow) — expresses HAPPENS, not membership.python/python3 resolves — verify alias first. Python may not be in bash PATH under those names. Check: where python / where py. Prefer py (Windows Python Launcher) for one-liners, node if JS alternative exists.Test-specific lessons →
docs/project-reference/integration-test-reference.mdLessons Learned section. Production-code anti-patterns →docs/project-reference/backend-patterns-reference.mdAnti-Patterns section. Generic debugging/refactoring reminders → System Lessons in.claude/hooks/lib/prompt-injections.cjs.
ExecuteInjectScopedAsync, NEVER ExecuteUowTask (shared DbContext = silent data corruption){CoreServiceName}...RequestBusMessagepython/python3 resolves — run where python/where py first, use py launcher or nodeBreak work into small tasks (task tracking) before starting. Add final task: "Analyze AI mistakes & lessons learned".
Extract lessons — ROOT CAUSE ONLY, not symptom fixes:
$learn.$code-review/$code-simplifier/$security/$lint catch this?" — Yes → improve review skill instead.$learn.
[TASK-PLANNING] [MANDATORY] BEFORE executing any workflow or skill step, create/update task tracking for all planned steps, then keep it synchronized as each step starts/completes.