auto-harness

// Diagnose and strengthen a repository's harness layer: AGENTS.md rules, structured knowledge layout, architecture boundaries, lint and type gates, API and generated-client contracts, test scaffolding, language-specific verification completeness, doc/code bindings, file-size budgets, structured logging, and technical-debt tracking. Use when Codex needs to audit drift-prone semantics that are repeated across code, docs, schemas, generated artifacts, frontend/backend boundaries, or CI, then design or implement guardrails with progressive disclosure, non-guess verification, bidirectional doc/code alignment, and zero-tolerance contract checks.

name

auto-harness

description

Diagnose and strengthen a repository's harness layer: AGENTS.md rules, structured knowledge layout, architecture boundaries, lint and type gates, API and generated-client contracts, test scaffolding, language-specific verification completeness, doc/code bindings, file-size budgets, structured logging, and technical-debt tracking. Use when Codex needs to audit drift-prone semantics that are repeated across code, docs, schemas, generated artifacts, frontend/backend boundaries, or CI, then design or implement guardrails with progressive disclosure, non-guess verification, bidirectional doc/code alignment, and zero-tolerance contract checks.

Auto Harness

Overview

Design the project harness so repository rules become executable. Start by mapping where semantics can drift, then convert the risky ones into generated artifacts, contract tests, type or layer guards, CI checks, runtime preflights, structured knowledge maps, and file-size budgets that keep the repository explorable.

Workflow

Build a harness map before proposing fixes.
- Read entrypoint materials first: AGENTS.md, repo-root docs, Makefile, package scripts, CI workflows, API specs, codegen config, lint config, type-check config, and test runners.
- Avoid bulk-loading every doc. Follow progressive disclosure: read only the references needed for the current risk.
- Record the harness surfaces that already exist: documentation rules, generated files, test suites, CI gates, architecture constraints, observability standards, debt tracking, runtime preconditions, per-language verification commands, knowledge-base directories, doc/code bindings, and file-size budgets.
Load the matching reference files.
- Read references/checklist.md to score the current harness and find missing guardrails.
- Read references/guide.md when you need the design principles, the default drift-audit passes, and recommended repository shapes.
- Read references/guard-patterns.md when you need concrete enforcement strategies.
- Read references/fix-plan.md when you need a staged remediation plan or a deliverable template.
- Read references/upstream.md when you need the builtin bundle provenance or manual sync guidance.
Diagnose drift as a systems problem, not a single bug.
- Search for the same semantic repeated across layers: permissions, routes, CLI commands, OpenAPI, generated clients, frontend assumptions, docs, logging fields, config, and tests.
- Flag any semantic that is declared in multiple places without a hard synchronization mechanism.
- Treat "scope list says yes, runtime path says no" and similar mismatches as harness failures, not just implementation bugs.
- Write the diagnosis in terms of duplicated authority, missing invariants, missing generated artifacts, or missing tests.
Run the default drift-audit passes on every high-risk semantic.
- Authority Duplication Pass: identify the source of truth, every downstream copy, and whether any copy can drift silently.
- Projection Completeness Pass: for canonical registries, enums, or event catalogs, check every projection such as UI labels, streams, metadata builders, docs, and generated artifacts.
- Structured/Freeform Consistency Pass: compare structured metadata with the prose that tells humans or agents how to use it, including prompts, templates, examples, and docs.
- Cross-Surface Contract Pass: verify backend, frontend, CLI, schemas, and generated clients cannot disagree on the same contract.
- Runtime/Docs/Tooling Pass: verify helper scripts, runtime contracts, examples, and validation commands still describe the real system.
- Language Verification Matrix Pass: inventory each language and runtime in the repo, then verify executable test runners, coverage collection, CI integration, and environment preconditions instead of inferring testability from file extensions or package manifests.
- Knowledge Topology Pass: verify the knowledge base is layered by semantic boundary and ownership, with stable directories, discoverable indexes, and bounded document size instead of one flat dump.
- Bidirectional Doc-Code Alignment Pass: verify every governed code file is bound from structured docs, every governed doc binds back to real code, and pushes require explicit human review that code diffs are reflected in docs.
Prefer hard guardrails over policy prose.
- Establish a single source of truth for drift-sensitive semantics.
- Generate downstream artifacts instead of hand-maintaining parallel definitions.
- Add failing checks in CI for zero-drift contracts: OpenAPI parity, generated client freshness, import-layer guards, JSON schema checks, snapshot parity, or table-driven route or scope coverage.
- Add structured doc/code binding checks so undocumented code files and dangling docs fail fast.
- Add formatter and file-budget checks for docs and high-risk code surfaces such as frontend feature files.
- Add type-level or package-boundary checks when architecture depends on layering.
- Add structured logs when the real system is expensive to exercise and deterministic tests are limited.
Calibrate verification instead of guessing.
- Build a language verification matrix for Go, TypeScript or JavaScript, Python, Java, shell, or any other runtime that appears in the repo.
- For each runtime, record the real test entrypoints, coverage mechanism, CI enforcement status, and required environment such as browsers, services, credentials, containers, or fake providers.
- Distinguish what is CI-enforced, locally runnable, documented but not enforced, and completely missing.
- Treat missing runnable environments or undocumented preconditions as harness defects, not merely setup friction.
- Do not claim that a code path is covered just because a framework dependency, config file, or test directory exists; verify executable commands and what they actually run.
Use tests to constrain maintainability.
- Add unit tests for pure rules and table-driven contracts.
- Add integration tests for boundary behavior, storage wiring, and auth or permission enforcement.
- Add e2e tests for user-visible workflows and generated frontend/backend integration points.
- Add regression tests for every bug whose root cause was semantic drift across layers.
- Favor tests that prove two layers cannot disagree, rather than tests that only validate each layer in isolation.
Keep docs and code aligned before changes ship.
- Maintain a structured doc/code binding map, ideally in doc frontmatter or another machine-readable manifest.
- Reject states where a governed code file has no bound doc, or a governed doc points to no real code.
- Before push, require a human diff review of changed code files against their bound docs so semantic changes are not shipped with stale prose.
- Treat missing doc updates, oversized docs, and oversized frontend files as maintainability defects, not style nits.
Produce actionable deliverables.
- Deliver a short harness map.
- Deliver the top drift risks and why they exist.
- Deliver a prioritized fix plan with fast wins, foundational changes, and CI gates.
- If asked to implement, land the smallest durable guardrail first, then expand coverage.

Operating Rules

Prefer executable constraints over narrative guidance.
Prefer generated artifacts over duplicate handwritten definitions.
Prefer structured knowledge directories over flat document dumps.
Prefer progressive disclosure over dumping every document into context.
Prefer contract tests that cover real commands, real routes, and real generated clients.
Prefer non-guess verification: inventory real runners, real environments, and real CI wiring before concluding a subsystem is protected.
Prefer bidirectional doc/code bindings for governed areas so prose and code cannot drift silently.
Prefer file-size budgets for docs and frontend-heavy modules before they become god files.
Treat frontend/backend contract drift as zero-tolerance when codegen or schema generation is feasible.
Treat missing structured logs as a harness defect when external integrations are costly to replay.
Use the same drift-audit passes repeatedly instead of inventing a new review frame for each subsystem.
Keep fixes incremental, but design around the final source-of-truth model.

Default Deliverable Shape

Return these sections when doing a harness audit:

Harness Map - current rules, generators, tests, CI gates, runtime prerequisites, documentation entrypoints, and knowledge-base topology.
Drift Risks - duplicated semantics, weak invariants, or missing sync checks.
Audit Pass Findings - which default passes exposed the highest-risk gaps and why.
Verification Matrix - per-language runners, coverage hooks, environment preconditions, and whether each check is CI-enforced, manual, documented-only, or missing.
Doc-Code Binding Gaps - undocumented code, dangling docs, missing human review gates, and file-size budget violations.
Guardrail Plan - concrete code or test mechanisms that would prevent recurrence.
Fix Order - what to land first, what to generate later, and what to move into CI.
Residual Risks - what still depends on convention instead of enforcement.

Auto Harness

Overview

Workflow

Build a harness map before proposing fixes.

Read entrypoint materials first: AGENTS.md, repo-root docs, Makefile, package scripts, CI workflows, API specs, codegen config, lint config, type-check config, and test runners.
Avoid bulk-loading every doc. Follow progressive disclosure: read only the references needed for the current risk.
Record the harness surfaces that already exist: documentation rules, generated files, test suites, CI gates, architecture constraints, observability standards, debt tracking, runtime preconditions, per-language verification commands, knowledge-base directories, doc/code bindings, and file-size budgets.

Load the matching reference files.

Read references/checklist.md to score the current harness and find missing guardrails.
Read references/guide.md when you need the design principles, the default drift-audit passes, and recommended repository shapes.
Read references/guard-patterns.md when you need concrete enforcement strategies.
Read references/fix-plan.md when you need a staged remediation plan or a deliverable template.
Read references/upstream.md when you need the builtin bundle provenance or manual sync guidance.

Diagnose drift as a systems problem, not a single bug.

Search for the same semantic repeated across layers: permissions, routes, CLI commands, OpenAPI, generated clients, frontend assumptions, docs, logging fields, config, and tests.
Flag any semantic that is declared in multiple places without a hard synchronization mechanism.
Treat "scope list says yes, runtime path says no" and similar mismatches as harness failures, not just implementation bugs.
Write the diagnosis in terms of duplicated authority, missing invariants, missing generated artifacts, or missing tests.

Run the default drift-audit passes on every high-risk semantic.

Authority Duplication Pass: identify the source of truth, every downstream copy, and whether any copy can drift silently.
Projection Completeness Pass: for canonical registries, enums, or event catalogs, check every projection such as UI labels, streams, metadata builders, docs, and generated artifacts.
Structured/Freeform Consistency Pass: compare structured metadata with the prose that tells humans or agents how to use it, including prompts, templates, examples, and docs.
Cross-Surface Contract Pass: verify backend, frontend, CLI, schemas, and generated clients cannot disagree on the same contract.
Runtime/Docs/Tooling Pass: verify helper scripts, runtime contracts, examples, and validation commands still describe the real system.
Language Verification Matrix Pass: inventory each language and runtime in the repo, then verify executable test runners, coverage collection, CI integration, and environment preconditions instead of inferring testability from file extensions or package manifests.
Knowledge Topology Pass: verify the knowledge base is layered by semantic boundary and ownership, with stable directories, discoverable indexes, and bounded document size instead of one flat dump.
Bidirectional Doc-Code Alignment Pass: verify every governed code file is bound from structured docs, every governed doc binds back to real code, and pushes require explicit human review that code diffs are reflected in docs.

Prefer hard guardrails over policy prose.

Establish a single source of truth for drift-sensitive semantics.
Generate downstream artifacts instead of hand-maintaining parallel definitions.
Add failing checks in CI for zero-drift contracts: OpenAPI parity, generated client freshness, import-layer guards, JSON schema checks, snapshot parity, or table-driven route or scope coverage.
Add structured doc/code binding checks so undocumented code files and dangling docs fail fast.
Add formatter and file-budget checks for docs and high-risk code surfaces such as frontend feature files.
Add type-level or package-boundary checks when architecture depends on layering.
Add structured logs when the real system is expensive to exercise and deterministic tests are limited.

Calibrate verification instead of guessing.

Build a language verification matrix for Go, TypeScript or JavaScript, Python, Java, shell, or any other runtime that appears in the repo.
For each runtime, record the real test entrypoints, coverage mechanism, CI enforcement status, and required environment such as browsers, services, credentials, containers, or fake providers.
Distinguish what is CI-enforced, locally runnable, documented but not enforced, and completely missing.
Treat missing runnable environments or undocumented preconditions as harness defects, not merely setup friction.
Do not claim that a code path is covered just because a framework dependency, config file, or test directory exists; verify executable commands and what they actually run.

Use tests to constrain maintainability.

Add unit tests for pure rules and table-driven contracts.
Add integration tests for boundary behavior, storage wiring, and auth or permission enforcement.
Add e2e tests for user-visible workflows and generated frontend/backend integration points.
Add regression tests for every bug whose root cause was semantic drift across layers.
Favor tests that prove two layers cannot disagree, rather than tests that only validate each layer in isolation.

Keep docs and code aligned before changes ship.

Maintain a structured doc/code binding map, ideally in doc frontmatter or another machine-readable manifest.
Reject states where a governed code file has no bound doc, or a governed doc points to no real code.
Before push, require a human diff review of changed code files against their bound docs so semantic changes are not shipped with stale prose.
Treat missing doc updates, oversized docs, and oversized frontend files as maintainability defects, not style nits.

Produce actionable deliverables.

Deliver a short harness map.
Deliver the top drift risks and why they exist.
Deliver a prioritized fix plan with fast wins, foundational changes, and CI gates.
If asked to implement, land the smallest durable guardrail first, then expand coverage.

Operating Rules

Prefer executable constraints over narrative guidance.

Prefer generated artifacts over duplicate handwritten definitions.

Prefer structured knowledge directories over flat document dumps.

Prefer progressive disclosure over dumping every document into context.

Prefer contract tests that cover real commands, real routes, and real generated clients.

Prefer non-guess verification: inventory real runners, real environments, and real CI wiring before concluding a subsystem is protected.

Prefer bidirectional doc/code bindings for governed areas so prose and code cannot drift silently.

Prefer file-size budgets for docs and frontend-heavy modules before they become god files.

Treat frontend/backend contract drift as zero-tolerance when codegen or schema generation is feasible.

Treat missing structured logs as a harness defect when external integrations are costly to replay.

Use the same drift-audit passes repeatedly instead of inventing a new review frame for each subsystem.

Keep fixes incremental, but design around the final source-of-truth model.

Default Deliverable Shape

Return these sections when doing a harness audit:

Harness Map - current rules, generators, tests, CI gates, runtime prerequisites, documentation entrypoints, and knowledge-base topology.

Drift Risks - duplicated semantics, weak invariants, or missing sync checks.

Audit Pass Findings - which default passes exposed the highest-risk gaps and why.

Verification Matrix - per-language runners, coverage hooks, environment preconditions, and whether each check is CI-enforced, manual, documented-only, or missing.

Doc-Code Binding Gaps - undocumented code, dangling docs, missing human review gates, and file-size budget violations.

Guardrail Plan - concrete code or test mechanisms that would prevent recurrence.

Fix Order - what to land first, what to generate later, and what to move into CI.

Residual Risks - what still depends on convention instead of enforcement.