| description | Create, verify, repair, and close Codex goals as durable, evidence-based objectives for long-running work; covers agent-native goal tool usage, measurable outcome gates, docs/plans goal plans, lifecycle handling, pass-gated lanes, blockers, budgets, completion audits, and template repair. |
| argument-hint | [objective | repair <expectation> | draft request | status/closeout question] |
| disable-model-invocation | true |
| name | autogoal |
| metadata | {"skiller":{"source":".agents/rules/autogoal.mdc"}} |
Autogoal
Use this when the user asks for a durable objective, long-running autonomous
work, goal setup, or when a governing repo skill requires goal setup before
work starts.
This skill turns a vague "keep going" instruction into a thread-scoped
completion contract: what should be true, how it is verified, what must not
change, and when Codex should stop.
Core Take
A normal prompt says: do the next thing.
A goal says: keep working until this outcome is true, or until the evidence
shows a real blocker.
Goals are for work where the next move depends on what Codex learns along the
way: debugging, migrations, flaky tests, benchmark tuning, deep research,
large refactors, prototypes, browser-proof loops, and pass-gated plans.
Goals are not a permission slip to wander. They are a scoped, evidence-checked
contract.
No measurable outcome, no goal. A goal must have a verification surface and a
completion threshold before create_goal is called. Prefer numbers: score,
count, latency, coverage, pass count, failing-to-passing repro count, issue
rows, or explicit command success. When a numeric target does not fit, use a
binary artifact checklist that can be audited from files, commands, screenshots,
browser proof, or source-backed citations.
Universal Boundary
autogoal is the goal lifecycle kernel. It owns:
- objective shape
- measurable completion thresholds
- evidence standards
- active goal conflict handling
- durable plan state
- blocker and completion rules
- repair routing when a goal-backed workflow misses expectations
It does not own project policy. Keep repo commands, package managers, browser
tools, release rules, PR policy, scorecards, issue ledgers, and lane-specific
pass schedules in derived skills or docs/plans/templates/<template>.md.
Derived skills may be stricter than autogoal; they should not duplicate the
goal lifecycle. autogoal says how work remains honest. The derived skill says
what the lane actually requires.
Template Composition
Goal plans are composable, but only through static materialization.
The model is:
- one active goal
- one concrete
docs/plans plan file
- one primary template
- optional materialized packs
The primary template is chosen by dominant risk: task for normal execution,
docs for docs-dominant work, major-task for heavyweight architecture or
proposal work, slate-plan for Slate plan lanes, and so on.
Packs are chosen by touched surface. They add recurring gates without becoming
parents:
docs: docs are touched but not the dominant deliverable
agent-native: .agents/**, .claude/**, .codex/**, skills, hooks,
commands, prompts, or user-action tooling changed
browser: real browser, route, UI, console, network, or interaction proof
is required
package-api: package exports, public API, release artifacts, package
boundaries, or package-level checks changed
Core execution and review gates belong in the primary template. Packs are only
for optional touched surfaces that would otherwise be absent from that
template.
Do not create runtime inheritance between templates. The helper copies pack rows
into the generated plan's Start Gates, Work Checklist, and
Completion Gates. After creation, the generated plan is the truth; the checker
validates that materialized plan only.
The generated plan is the dedicated plan shell. Fill that exact file
immediately after generation: replace placeholders, resolve every gate row, and
mark non-applicable generated rows as N/A: <reason> with evidence. Do not
delete, wholesale replace, or hand-narrow the generated plan into an ad hoc
smaller plan after durable work has started. If the selected template is plainly
wrong and no substantive work has started, regenerate once with the right
template and record why. If work has already started, keep the generated plan
and close it honestly.
Use packs like this:
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
--template task \
--with docs \
--with agent-native \
--title "<short task title>"
Examples:
- docs-only work:
--template docs
- normal code task that also changes docs:
--template task --with docs
- agent workflow task:
--template task --with agent-native
- browser behavior task:
--template task --with browser
- published package API task:
--template task --with package-api
- major architecture task:
--template major-task
- major architecture task that also changes docs and package API:
--template major-task --with docs --with package-api
If two packs add related gates, keep both when they protect different failure
modes. If they duplicate exactly the same proof, keep the more specific pack and
record the other as N/A in the plan.
Proportionality Dial
Classify goal-backed work before creating or updating a plan:
micro: one narrow, auditable outcome; no cross-file state; no meaningful
continuation loop. Use a tiny plan only when a repo rule requires it, or
record the audit surface directly in the final response.
normal: multi-step work with concrete evidence and likely continuation.
Use the appropriate docs/plans template and close all relevant gates.
major: architecture, migrations, benchmarks, framework comparisons,
broad refactors, pass-gated lanes, or public API/runtime risk. Use a derived
skill or project template with phases, risk rows, review gates, and explicit
closure criteria.
Do not inflate a micro work item into a ceremony pile. Do not shrink a major
work item into a checklist that cannot catch real risk.
Goal Flow Modes
Every goal-backed workflow chooses exactly one flow mode before durable work
starts. The mode controls the human review boundary; it does not weaken the
evidence or completion rules.
1. One-Shot Execution
Use this for issue-like or work-item-like work where the agent is expected to
read the source, derive the local plan, implement, verify, and hand off the
result without stopping for plan approval.
Rules:
- Create or continue a goal when the work is non-trivial and auditable.
- Create a plan when durable state is useful or required by the caller.
- The plan is an execution ledger, not a proposal waiting for acceptance.
- Human review happens at the final handoff or explicit user interruption.
- Do not pause merely because the plan has not been reviewed. Pause only for a
real blocker, unsafe ambiguity, or a user decision that changes scope.
2. Agent-Led Plan Hardening
Use this when the requested output is a plan and the user wants the agent to
drive toward the best plan with minimal human interruption.
Rules:
- The agent owns the review loop: research, compare options, pressure-test,
revise, and improve the plan until the confidence threshold is met.
- Ask the user only for decisions that materially change intent, boundaries,
risk tolerance, or acceptance criteria.
- Record each self-review pass and plan delta as evidence.
- Stop for one major user review when the plan reaches the stated readiness
threshold.
- Do not execute implementation under the planning goal unless the caller's
governing workflow explicitly says planning and execution are the same goal.
3. Collaborative Planning
Use this when the user and agent are intentionally shaping the plan together
before execution.
Rules:
- The goal outcome is an accepted plan, not implementation.
- Ask focused questions when user judgment changes the plan.
- Keep options, tradeoffs, rejected alternatives, and open decisions visible in
the plan.
- Continue revising until the user accepts the plan or a blocker remains.
- Execution starts only after explicit acceptance or a new instruction that
changes the flow mode.
Flow-mode selection belongs in the derived skill or the instantiated plan when
the caller knows it. If no caller specifies a mode, default to one-shot
execution for implementation tasks, agent-led plan hardening for autonomous
planning/review requests, and collaborative planning when the user is actively
brainstorming or asking for plan acceptance before work.
Use When
- The user asks to set a goal or asks Codex to keep working until a verifiable
end state.
- A repo skill says to use
create_goal or goal setup.
- Work is long-running, iterative, and has an auditable success condition.
- The path is uncertain but the finish line is auditable.
- The user would otherwise keep saying: "continue", "try the next fix", "rerun
the benchmark", "keep going until it works".
- A pass-gated lane needs one durable objective with the pass schedule and
closure gates inside it.
- The user says
autogoal repair <expectation> after any goal-backed workflow
missed their expectation, and they want the owning rule/template repaired for
future runs.
Do Not Use When
- The user asks a one-off question or wants one short answer.
- The edit is tiny and no continuation loop is useful.
- The finish line is vague: "make it better", "improve performance", "clean
this up" without a verification surface.
- The user explicitly declined goal setup or asked not to use goal tools.
- The only possible next move requires user input.
- Creating a goal would hide uncertainty instead of naming it.
- The user only wants the current artifact fixed once. Repair mode is for
recurring workflow expectation misses, not every ordinary bug in a plan file.
Tool Contract
This is agent-native. Use the goal tools directly when available:
get_goal to inspect the current thread goal.
create_goal to start a new active goal.
update_goal(status: complete) only when the objective is genuinely met.
update_goal(status: blocked) only when no autonomous progress remains and
the same blocker has recurred enough to satisfy the tool contract.
There can be only one active goal per thread. Repeated create_goal calls fail
while a goal exists. Always call get_goal first; call create_goal only when
it returns no goal; use update_goal to complete or block the active goal.
Active Goal Conflict Protocol
When get_goal returns a goal, classify it before touching durable state:
same: the existing goal already describes the current requested end state.
Continue under it and keep its plan current.
same but stale plan: the goal is right but the plan is stale. Repair the
plan first, then continue.
newer user correction: the latest user message narrows, reverses, or
corrects the goal. Record the correction in the plan, follow the newest
instruction, and do not call the old objective complete unless it is actually
true.
different objective: the active goal is unrelated. Do not hijack it. If no
lifecycle tool can pause, resume, cancel, or replace it, say so briefly and
proceed only with degraded plan state when the user explicitly says to go.
paused or externally controlled: do not fake completion or blocked status
to escape the tool. Continue only if the latest user instruction clearly
authorizes the new work, and record the mismatch in the plan.
Never mark a goal complete because the user changed their mind. Completion
means the objective is true. A correction changes the work path; it does not
retroactively prove the old objective.
Do not invent a goal state file when a goal tool is available. If goal tools are
not available, record degraded control state in the active plan only when the
repo workflow requires that fallback; otherwise state that goal tools are not
available and continue with the nearest safe workflow.
Goal Anatomy
A strong goal defines eight things:
- Flow mode: one-shot execution, agent-led plan hardening, or collaborative
planning.
- Outcome: what must be true when done.
- Completion threshold: the number, pass/fail command, artifact checklist, or
explicit acceptance rows that prove done.
- Verification surface: tests, benchmarks, logs, browser proof, generated
artifact, report, issue comment, or source-backed audit.
- Constraints: what must not regress.
- Boundaries: files, packages, repos, tools, data, routes, issue scope, or
product surfaces Codex may or may not touch.
- Iteration policy: how to choose the next move after each attempt.
- Blocked stop condition: when to stop and report the blocker, evidence, and
next input needed.
Use this objective shape:
<desired end state>, complete only when <quantitative or auditable threshold>,
verified by <specific evidence>, and when the active goal plan passes
`node .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path>`, while
preserving <constraints>. Use flow mode <one-shot execution | agent-led plan
hardening | collaborative planning> and <allowed inputs/tools/boundaries>.
Maintain goal plan <docs/plans/path>. Between iterations, <progress log and
next-move policy>. If blocked or no valid path remains, report <attempts,
evidence, blocker, and needed input>.
Measurable Outcome Gate
Before calling create_goal, rewrite vague objectives into measurable ones.
Required:
- a specific done state
- a flow mode
- a verification surface
- a completion threshold
- a constraint list or explicit
no extra constraints
- a blocked condition
Quantitative examples:
p95 < 120 ms
score >= 0.92 and no dimension below 0.85
0 accepted review findings
all 12 pass rows complete or skipped with evidence
focused repro fails before fix and passes 5 consecutive runs after
no stale symbol matches from rg
Auditable non-numeric examples:
- named file exists with required sections
- named issue rows moved to fixed/improved/related/not-claimed
- named browser route has screenshot proof and no console errors
- named API examples compile and match the accepted public shape
Reject or rewrite:
- "make better"
- "clean up"
- "finish"
- "absolute best" without score rows, pass gates, or evidence
- "review and decide" without an artifact and acceptance criteria
Completion Gate Policy
Do not make check-complete.mjs the whole goal. That only proves the plan looks
closed, not that the work is true.
Use the hybrid rule for every goal:
- The goal objective names the real outcome, threshold, verification surface,
constraints, boundaries, and blocked condition.
- The
docs/plans goal plan records the fresh evidence for that threshold.
node .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path> is
the final mechanical gate before update_goal(status: complete).
The checker validates that the goal plan has no unchecked required checklist
items, no unresolved gate rows, no open phase/pass rows, concrete verification
evidence, current reboot status, and recorded risks. It does not replace tests,
browser proof, source audits, benchmark output, or other named verification
evidence.
Evidence Type Contract
Every completion proof should fit at least one evidence type:
command: exact command, cwd, and pass/fail result.
source-audit: exact files or search query proving a static property.
browser: route, interaction, screenshot or console/network caveat.
artifact: generated file, report, table, PR body, issue comment, or
exported asset.
review: reviewer/tool used, accepted findings, fixes, and remaining
rejected findings with reasons.
external-source: cited URL, issue, paper, docs page, or connected app
result used as authority.
N/A:<reason>: why a recurring gate does not apply.
Evidence must name the owning workspace, package, app, route, or tool when
that ownership matters. A root-level check cannot prove a sibling repo, app
route, browser surface, or external tracker unless the plan explains why it is
the owning surface.
Repair Mode
Trigger this mode when the arguments start with:
repair <expectation>
Repair mode is self-improvement with a leash. It converts a concrete expectation
miss from a goal-backed run into the smallest durable change to the owning
rule, template, helper, or active plan.
Use it for misses like:
- the generated goal plan lacked a gate the user expected
- a derived skill used the wrong template or completion rule
- the skill completed too early or kept running past the intended boundary
- the final handoff omitted evidence the user expects every time
- the workflow forced too much ceremony or skipped a required review/proof step
Do not use it for:
- one-off wording preferences in a single plan
- a product/runtime bug that belongs in implementation code
- broad "make all skills better" edits
- rewriting generated
.agents/skills/*/SKILL.md by hand
Target selection order:
- If the prompt names a plan path, read that plan first. Use its
Template:,
skill name, phase table, and completion gates to identify the owner.
- If the prompt names a skill, read
.agents/skills/<skill>/SKILL.md first, then
docs/plans/templates/<skill>.md when it exists.
- If there is an active goal, read its plan path from the objective or current
plan before editing anything.
- If the miss belongs to every goal, target
.agents/rules/autogoal.mdc and
docs/plans/templates/goal.md.
- If ownership is still unclear after source reads, ask one short targeting
question instead of patching multiple templates.
Repair scope matrix:
| Miss | Primary repair owner |
|---|
| Current plan has wrong status, row, evidence, or handoff fields | active docs/plans/* plan |
| Future generated plans need a recurring section, gate, row, or placeholder | docs/plans/templates/<owner>.md |
| Agent chose the wrong workflow, target, proof standard, or completion rule | .agents/rules/<owner>.mdc |
| Prose keeps failing and the miss is mechanically checkable | .agents/rules/autogoal/scripts/* plus focused script proof |
| Derived skill adds lane-specific ceremony or policy | derived skill rule/template, not autogoal |
| Universal lifecycle rule is missing across goal-backed work | .agents/rules/autogoal.mdc |
Repair workflow:
-
Restate the expectation in one sentence.
-
Identify the miss with source evidence: plan row, final response shape,
missing gate, bad status, wrong template, or stale generated skill.
-
Pick exactly one primary owner. Patch secondary owners only when sync is
required, such as source rule plus project template.
-
Create a repair plan with:
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
--template goal-repair \
--title "<short repair title>"
If a repair is truly trivial, record why no separate repair plan is needed.
-
Patch source-of-truth files only. Never hand-edit generated
.agents/skills/*/SKILL.md; after changing .agents/rules/**, run
pnpm install.
-
Prove the repair:
- source audit with
rg for the new rule/gate/wording
- generated skill sync when
.agents/rules/** changed
- instantiate the repaired template or inspect it directly when a smoke plan
would create noise
- verify unfinished generated plans still fail
check-complete.mjs
- verify a completed plan can record the new expectation without editing the
template again
-
Final response says: expectation, repaired owner, verification, and any
deliberate non-repair.
Safety rules:
- One expectation should produce one narrow repair. Do not turn repair mode into
a skill rewrite.
- Do not weaken completion gates just because a past run was annoying. If the
expectation conflicts with evidence safety, record the conflict and ask.
- Prefer adding a missing row or decision rule over adding a new script. Add
mechanical enforcement only when prose gates keep failing.
- A derived skill may have stricter rules than
autogoal. Repair the derived
skill when the expectation is lane-specific; repair autogoal only when the
expectation should apply across goal-backed work.
- If an active goal is unrelated to the repair, do not hijack it. Ask whether to
finish/block it first or run the repair after it is closed.
Derived Skill Contract
Any skill that requires or wraps autogoal should declare:
- when it creates or continues a goal
- which flow mode it uses by default, and how the user changes it
- which
docs/plans/templates/<template>.md it uses
- which packs it applies by default, and which touched surfaces add more packs
- extra start gates and completion gates it owns
- evidence types it requires
- final handoff shape
- review or pressure lenses it adds
- what remains delegated to
autogoal
- what it intentionally does not inherit from broader templates
Derived skills should route to autogoal for lifecycle mechanics instead of
re-implementing plan creation, completion, blocked semantics, repair mode, or
evidence closure.
Resume Protocol
After compaction, interruption, or a long pause:
- Read the latest user message first.
- Call
get_goal when available.
- Re-read the active
docs/plans path named by the goal, current workflow, or
latest handoff.
- Find the latest verification evidence, open risk, and next owner.
- Continue from the newest user instruction, not from an older stale objective.
- Before final response, sanity-check that the answer matches the newest
request and the current plan state.
If the active goal and newest request disagree, use the Active Goal Conflict
Protocol before editing.
Start And Completion Gates
Project templates may define Start Gates: and Completion Gates: tables.
These are template-owned audit surfaces for recurring project checks.
Keep this rule generic. Do not put project-specific commands, package-manager
details, release rules, browser tooling, or repo policy in this file. Those rows
belong in project-owned templates under docs/plans/templates/.
When present, gate tables must use markdown tables with these columns:
They may include extra columns such as Required action. The checker treats any
cell in a gate row as unresolved when it is blank, pending, TODO, or TBD.
Gate closure rules:
Applies must be resolved before completion.
yes means the evidence cell names the command, artifact, proof, source
audit, or concrete result.
no or N/A: <reason> means the evidence cell explains why the gate does
not apply.
- A completion gate row should stay unresolved until the action or reason is
recorded.
check-complete.mjs enforces gate-row closure mechanically, but it does not
know what project-specific commands mean.
Start Workflow
- Read the user's request and any named plan, issue, logs, route, test, or
source-of-truth file.
- Inspect the current goal with
get_goal when available.
- Select the flow mode: one-shot execution, agent-led plan hardening, or
collaborative planning.
- Rewrite the desired objective until it has a measurable or auditable
completion threshold.
- If no active goal exists and the user or governing skill asked for a goal,
create one with
create_goal.
- If an active goal already matches the desired end state, continue under it.
- If an active goal exists but points at a different objective, do not overwrite
it. Resolve the current goal honestly before starting another one. If the
tool does not allow that transition, report the mismatch and ask for the
smallest decision needed. A governing lane goal may proceed only when it can
honestly complete or fit within the current active goal.
- Create the
docs/plans goal plan from the checklist template before
substantive work.
- Fill the generated plan itself before substantive work: write the objective,
threshold, verification surface, constraints, boundaries, blocked condition,
flow mode, and goal plan path; resolve generated gates as yes/no/N/A instead
of deleting or replacing the template output.
- Record the output-budget strategy before exploratory commands: which
searches or reads are allowed, which high-volume paths are excluded, and
how large results will be capped, counted, or saved as artifacts instead of
streamed into the goal context.
- Use that exact path for
check-complete.mjs.
- Do not start durable work until the goal is set, verified as already matching,
or the user explicitly resolves the missing-goal path.
Set the goal before mutable lane state when the workflow depends on a goal. For
pass-gated planning or accepted-plan execution lanes, the goal is the first
durable action after the minimum read needed to derive the objective.
Goal Plan
Every active goal gets one durable goal plan. It is a single markdown file that
absorbs the useful file-planning parts: phases, findings, progress,
decisions, failed attempts, verification, and reboot status.
Path:
docs/plans/YYYY-MM-DD-<short-goal-slug>.md
docs/plans/<ticket>-<short-goal-slug>.md
Use the ticket-prefixed form for issue-backed work. Do not create
task_plan.md, findings.md, progress.md, .planning/**,
docs/goals/**, .tmp/goals/**, or hook state for goal work. Hooks are
overkill. The active goal plus the docs/plans file are the durable state.
Create the goal plan with the source-owned helper whenever available:
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
--title "<short title>" \
--template "<primary template name or path>" \
--with "<optional pack name>"
The helper writes docs/plans/YYYY-MM-DD-<slug>.md or
docs/plans/<ticket>-<slug>.md from a project-owned template. The helper lives
under .agents/rules/autogoal/ because it is generic rule tooling; generated
SKILL.md files are not edited by hand.
Do not pass objective, threshold, verification, constraints, boundaries, or
blocked condition through CLI flags. The CLI only creates the static plan shell.
After creation, edit the generated docs/plans file and write the active goal
objective, completion threshold, verification surface, constraints, boundaries,
blocked condition, and remaining goal-specific rows into the file.
Editing the generated file means filling and resolving that materialized shell,
not replacing it with a hand-made mini-plan. Keep generated sections and rows
unless the row is truly irrelevant, then mark it complete with N/A: <reason>.
If a template choice is wrong before work starts, regenerate with the correct
template and record the replacement. If any durable work has already started,
do not swap the plan out from under the work; close the generated plan with
honest evidence, N/A rows, or a blocker.
The default project template is generic:
docs/plans/templates/goal.md
Project or skill-specific templates live beside it:
docs/plans/templates/<template>.md
Reusable packs live under:
docs/plans/templates/packs/<pack>.md
Use templates by passing the primary template name. Add packs for touched
surfaces:
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
--template "<template-name>" \
--with "<pack-name>" \
--title "<short title>" \
...
Repeat --with for multiple packs, or pass a comma-separated list. The helper
records Primary template: and Applied packs: in the generated plan and
copies pack rows into the plan's existing gate/checklist sections.
docs/plans/templates holds reusable project templates. Direct files under
docs/plans are instantiated runtime goal plans. Do not store goal templates or
active goal state under docs/goals.
Create a new project-owned template by copying the generic template:
node .agents/rules/autogoal/scripts/create-goal-template.mjs \
--skill "<skill-name>"
Then edit the new docs/plans/templates/<skill-name>.md to add that skill or
project lane's mandatory sections, checklist rows, phase schedule, evidence
rows, and closure gates. Keep the generic goal template project-agnostic.
Template creation is not skill creation. Do not generate .agents/rules/*,
.agents/skills/*, aliases, execution handoffs, hook state or compatibility
bridges from this workflow. A project template is just a reusable static shell
for a future docs/plans/* goal plan. The agent fills the real objective,
threshold, verification surface, constraints, boundaries, and blocked condition
inside the instantiated plan.
Before creating or updating a project template, define these inputs:
- template name and owning skill or project lane
- primary-template role and which packs should usually compose with it
- display name and purpose
- recurring failure mode the template prevents
- use cases and non-use cases
- allowed edit boundaries for plans created from it
- required read-first sources and optional read-when-relevant sources
- evidence sources and final verification surface
- measurable score, count, pass/fail command, or artifact checklist threshold
- required plan sections
- required checklist rows, including skill analysis and final goal-plan check
- phase or pass table, or an explicit reason the template needs no phases
- completion gates and score caps when score is used
- review or pressure lenses that must run before closeout
- handoff, final response, and risk rows
- blocked condition and what input would unblock it
If an input cannot be inferred from current project context, add a placeholder
inside the template and label it as a generation gap. Ask the user only when
the missing answer changes the template's purpose, safety model, or boundaries.
Template quality bar:
- The template must be self-contained enough to create a useful goal plan from
scratch. Do not require a sibling template to understand it.
- Sibling templates may be used for sync review, not as hidden dependencies.
- Packs may provide recurring touched-surface rows, but only after the helper
materializes them into the generated plan. Do not rely on hidden pack state.
- Domain facts must be placeholders or instructions unless live source proves
them. Do not invent current-state, before/after, API, product, or workflow
facts.
- No template may let a goal finish from polished prose, score alone, or a
completed phase table without fresh evidence.
- Every required checklist item must map to evidence, an explicit N/A reason,
or a blocker.
- Every required section is either present in the template or omitted with a
recorded reason.
- Project templates that cover implementation work should include compact gates
for review target selection, workspace-authority verification, specialized
agent/tooling review when those surfaces change, and a high-risk note for
public API, runtime, package-boundary, browser, agent-action, or command
contract changes. Do not copy a major planning lane's scorecard, issue
ledger, or full pass schedule into generic execution templates.
- The template should prefer concrete commands, file paths, issue rows,
browser routes, screenshots, benchmark names, or source-audit rows over vague
"review" wording.
- The generated plan remains the runtime truth. Do not put active goal state in
docs/plans/templates.
Template sync review:
- Instantiate the template once with
create-goal-scratchpad.mjs or inspect the
copied file directly when a smoke plan would create noise.
- Verify the expected headings, checklist rows, phase/pass rows, completion
gates, and blocker rows are present.
- Verify a blank or unfinished instantiated plan fails
check-complete.mjs.
- Verify a completed plan can record the named evidence without editing the
template itself.
- After editing
.agents/rules/autogoal.mdc, run pnpm install to regenerate
generated skill files.
Create the plan before substantive edits. Update it after every meaningful
decision, finding, tradeoff, failed attempt, review fix, verification run, or
scope change. Re-read it before major decisions and after compaction or
interruption.
Check the goal plan before completion:
node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/<goal-plan>.md
This is the final mechanical gate, not a substitute for the named verification
surface.
The goal-plan checklist is mandatory. Its first required item is skill analysis.
Do not call
update_goal(status: complete) while any required checklist item remains
unchecked. If an item does not apply, check it and add N/A: <reason>.
Required goal-plan sections:
# <Goal title>
Objective:
<exact active goal objective>
Flow mode:
<one-shot execution | agent-led plan hardening | collaborative planning>
Goal plan:
<docs/plans/path>
Primary template:
<docs/plans/templates/name.md>
Applied packs:
- <pack or none>
Completion threshold:
- <quantitative or auditable done row>
Verification surface:
- <tests/artifacts/browser proof/source audit>
Constraints:
- <must preserve / must not touch>
Boundaries:
- <allowed files/packages/tools>
Output budget strategy:
- <how command/search output will be scoped, capped, counted, or artifacted>
Blocked condition:
- <condition that stops autonomous work>
Start Gates:
| Gate | Applies | Evidence |
Work Checklist:
- [ ] Actual work item or pass-specific requirement with evidence.
- [ ] ...
Completion Gates:
| Gate | Applies | Required action | Evidence |
Phase / pass table:
| Phase | Status | Evidence | Next |
Findings:
- <research, source reads, browser/visual findings as data>
Timeline:
- <timestamp> <action/evidence>
Decisions and tradeoffs:
- <decision> -> <reason> -> <risk>
Review fixes:
- <finding> -> <accepted/rejected> -> <change or reason>
Error attempts:
| Error / failed attempt | Count | Next different move | Resolution |
Verification evidence:
- <command/artifact> -> <result>
Reboot status:
| Where am I? | Where am I going? | What is the goal? | What learned? | What done? |
Open risks:
- <risk or none>
Before update_goal(status: complete), the goal plan must include the final
verification evidence, checked checklist, current reboot status, and any
remaining risks.
Good Goal Examples
Performance:
Reduce p95 checkout latency below 120 ms, complete only when the checkout
benchmark reports p95 < 120 ms and the correctness suite passes, while keeping
public API behavior unchanged. Use only checkout service code, benchmark
fixtures, and related tests. Maintain goal plan
`docs/plans/YYYY-MM-DD-checkout-latency.md`. After each iteration, record the
change, benchmark result, and next experiment. If the benchmark cannot run or no
valid path remains, stop with attempted paths, evidence, blocker, and needed
input.
Bug hunt:
Fix the flaky checkout test on the current branch, complete only when a focused
repro fails before the fix and passes 5 consecutive runs after, while preserving
public API behavior. If the failure cannot be reproduced after the agreed
attempts, produce an evidence-backed blocker report.
Research:
Produce the strongest evidence-backed reproduction of the target paper
using available materials and local resources, complete only when every headline
claim has a status row: confirmed, approximate, proxy-supported, blocked, or
uncertain. Attempt every headline result where feasible and end with a report
separating confirmed mechanics, approximate reconstructions, blocked exact
replay, and remaining uncertainty.
Pass-gated planning:
Close the layout plan for user review by running the scheduled passes
one activation at a time, complete only when score >= 0.92, no dimension is
below 0.85, every scheduled pass row is complete or skipped with evidence,
issue/reference sync rows are closed, closure gates pass, and final handoff is
emitted. Do not edit implementation code.
Weak Goal Examples
Improve performance
Make this better
Refactor the editor
Run all passes
Finish the project
These are weak because they lack a measurable outcome, verification surface, or
scope boundary.
Pass-Gated Goals
For pass-gated lanes, prefer one lane goal when the goal tool can persist across
turns. Put the pass schedule in the goal objective, run one pass per activation,
and complete the goal only when closure gates prove no pass remains runnable.
Use this when a workflow has scheduled passes such as current-state read,
issue discovery, intent boundary, research refresh, steelman, revision,
verification sweep, or closure.
Rules:
- The goal objective should describe the lane outcome, full pass schedule,
one-pass-per-activation policy, proof gates, and closure condition.
- Complete the current pass in the plan or progress ledger, not by closing the
goal.
- Complete the goal only when every required pass is complete or intentionally
skipped with evidence.
- Do not use separate per-pass goals; keep scheduled passes as rows in the
active plan.
- Keep pass status in the plan or progress ledger; keep goal status tied to the
whole lane.
Progress fields for pass-gated lanes:
current_pass: current-state-read
current_pass_status: in_progress
next_pass: related-issue-discovery
goal_status: active
Allowed goal_status values:
Completion Rules
Mark a goal complete only when:
- the outcome in the goal is actually achieved
- the completion threshold is met exactly
- the verification surface named by the goal was checked
- the
docs/plans goal plan is updated with final verification
- every required goal-plan checklist item is checked or marked N/A with reason
node .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path> passes
after the final evidence is recorded
- constraints and boundaries were respected, or deviations were explicitly
accepted
- required artifacts were created or updated
- no required owner remains runnable
- the final response reports the evidence, not just confidence
Do not mark complete because:
- tests passed but the goal also required review, browser proof, docs, or a
report
- the budget is nearly exhausted
- the current slice is done but later slices remain
- a plan was written but execution or proof remains
- the user says "nice" without accepting open risks
When calling update_goal(status: complete), include the tool's final token/time
usage in the user-facing closeout when the tool returns it.
Blocked Rules
Blocked is terminal for the goal, not a normal checkpoint.
Use blocked only when:
- no autonomous next move remains
- missing evidence, access, tooling, data, or a user decision prevents progress
- repeated attempts show the same blocker, and the tool's blocked threshold is
satisfied
Do not mark blocked when:
- more investigation is possible
- a different test, smaller repro, or narrower source read is available
- the work is merely hard, slow, or broad
- a review pass found issues that can be fixed
- a gate failed and the failing owner is obvious
Blocked report shape:
Goal blocked.
Attempted:
- ...
Evidence:
- ...
Blocker:
- ...
Needed to continue:
- ...
Budget Handling
Budget exhaustion is not success.
Output Budget Discipline
Goal token budgets are real work budgets, not decorative counters. A goal run
that burns its budget on tool output has failed the workflow even when no app
code was touched.
Before running exploratory commands inside an active goal:
- Prefer narrow reads over broad scans: exact files, focused
rg -n patterns,
targeted globs, and short sed -n ranges.
- Treat
tmp/**, logs, binaries, generated output, build artifacts,
node_modules, .next, .turbo, and coverage folders as excluded by
default. Include them only when they are the named source of truth.
- Set explicit tool output caps for commands likely to return more than a
screenful. Keep ordinary source reads around a few thousand tokens, and
justify any larger cap in the plan.
- For broad audits, first ask for counts, filenames, or top matches
(
rg --count, rg --files-with-matches, --max-count, wc, head) before
printing matching lines.
- If a result may be large but still matters, write it to a local artifact and
inspect slices from that artifact. Do not stream the full result into the
conversation.
- Never run unbounded
rg across the whole repo plus tmp/api, logs, or binary
outputs during a budgeted goal. Split the search by owner or exclude the noisy
trees first.
- After any accidental large output, stop broad exploration immediately, record
the miss in the error-attempts row, and continue only with constrained
commands.
If the system stops or warns because a goal budget is reached:
- stop substantive work
- summarize current evidence and remaining owners
- name the next useful action
- do not call the goal complete unless the original objective is already proven
Lifecycle Boundaries
Do not use update_goal for lifecycle transitions outside its contract.
The model may complete or block a goal only through update_goal when the tool
contract is satisfied. Other lifecycle transitions are user/system-owned. If
the user asks for a lifecycle transition and no direct tool is available, state
that the current runtime does not expose that control instead of faking it with
completion or blocked status.
Status Updates During Goals
Keep status short and evidence-based:
- current checkpoint
- what changed
- what was verified
- what remains
- whether blocked
- next concrete action
Avoid vague updates like "making progress" or "continuing investigation". If
status gets vague, tighten the goal or checkpoint.
Research Goals
Research goals need stricter epistemic accounting.
Final reports should separate:
- confirmed findings
- approximate reconstructions
- proxy/support-only evidence
- blocked exact claims
- remaining uncertainty
Do not flatten "approximate support" into "reproduced" or "fixed". A good
research goal lets Codex keep working through uncertainty while preventing
overclaiming.
Closeout Template
Use this shape when closing a goal:
Goal complete.
Evidence:
- <command/artifact/source>
What changed:
- <short list>
Constraints preserved:
- <short list>
Residual risk:
- <only if real>
Usage:
- <tool-reported tokens/time, when available>
For blocked:
Goal blocked.
Evidence:
- <what was tried>
Blocker:
- <why no autonomous progress remains>
Needed next:
- <specific user/tool/input>