| description | Work a task end-to-end with lean context gathering, implementation, and verification |
| argument-hint | [task description | issue id/link] |
| disable-model-invocation | true |
| name | task |
| metadata | {"skiller":{"source":".agents/rules/task.mdc"}} |
Work Task
Handle $ARGUMENTS. Be thorough, not ceremonial. Start from the source of truth,
load extra skills only when they earn their keep, and verify before calling the
task done.
#$ARGUMENTS
Core Rules
- Read the task source first.
- Read local repo instructions and relevant files before editing.
- Search for existing patterns before inventing new ones.
- Prefer the best durable fix over the smallest local patch. If the root cause
lives in an API or abstraction seam, change that seam even when it means a
broader refactor or an intentional API change.
- Prefer targeted tests and checks during iteration.
- Keep the user updated at milestones.
- Verify the actual result before claiming done.
- Do not default to research swarms, review swarms, or browser proof.
- For verified code-changing work, default to creating or updating the PR unless
the user explicitly said not to.
- Do not default to compounding.
- Before calling a task blocked on a repo-wide gate, rule out local install
corruption once when the failure smells wrong for the diff.
Intake
- Classify the input:
- Plain task text: the user prompt is the source of truth.
- File path or spec path: read it first.
- GitHub issue URL: fetch it with
gh issue view first.
- GitHub PR URL: fetch it with
gh pr view first.
- Bare GitHub issue like
#555: resolve it against the current gh repo
first, then fetch it with gh issue view.
- Linear issue link/id: fetch it with the Linear integration first.
- Read the full source-of-truth context before doing anything else.
- If the task comes from a ticket or issue, also read the comments and
attachments when available.
- If the task comes from a tracker item and any attachment or linked media is a
video or screen recording, you must have transcript output with
video-transcripts before implementation.
- The canonical shared cache lives in the tracker next to the evidence it
describes.
- Cache comment body should be XML only:
```xml
<video-transcripts>
<video-transcript
source-key="https://tracker-hosted-video/<stable-path-without-query>"
>
[00:00] (...)
</video-transcript>
</video-transcripts>
```
source-key must be the normalized stable identifier for the media. For
signed tracker-hosted URLs like uploads.linear.app, strip the query string
so rotating signatures do not bust the cache.
- Do not add decorative metadata like
title to cached
<video-transcript> entries unless a later workflow truly needs it.
- Group videos by source container before transcribing:
- one cache comment for issue or PR body videos
- GitHub: one dedicated top-level cache comment for each issue or PR comment
that contains video(s)
- Linear: one cache reply for each comment that contains video(s)
- Keep the cache body pure XML. Do not prepend markers, prose, or YAML.
- Before running the helper, scan existing tracker comments and, when relevant,
Linear replies. Match by source container first, then check whether the cached
<video-transcript ... source-key="..."> entries cover every current video
URL for that container after the same normalization.
- If the matching cache fully covers the current normalized video keys for that
source container, reuse the cached XML block and do not re-transcribe.
- If the cache is missing any current video, or that source container has new
video evidence, transcribe only the missing videos and create or update the
matching cache comment or reply.
- Use $video-transcripts skill.
- Run it once per relevant video attachment or URL.
- For auth-gated tracker media like
uploads.linear.app or private GitHub asset
URLs, use the helper directly before declaring the video blocked. It can reuse
local tracker auth when available.
- Convert the video evidence into a normalized block in this exact shape before
continuing:
<video-transcripts>
<video-transcript source-key="...">
[00:00] (...)
</video-transcript>
</video-transcripts>
- Use the generated or cached transcript output as part of the tracker context.
- Do not hand-write, paraphrase, or skip video evidence when the skill can run.
- If the helper still cannot fetch or transcribe the media after a real attempt,
hard stop and report the exact blocker you hit.
- Do not continue implementation without the transcript when a real video is
present.
- If there are multiple videos, preserve each as its own
<video-transcript ...> block under one <video-transcripts> wrapper.
- Classify the task shape before choosing a workflow:
- Testing or coverage work: triggered by coverage, regressions, test-suite
phases, hotspots, package testing, or similar language.
- Program or batch work: triggered by multiple packages, phases, buckets, or
ordered slices.
- Ordinary one-shot work: bug, feature, refactor, docs, review, or
investigation that can be finished as a single slice.
- Classify whether this is heavyweight framework or library work:
- Heavyweight work: architecture or public API redesign, breaking changes,
major cross-package refactors, benchmarking, profiling strategy,
performance comparison, scalability work, framework comparison, migration
analysis, RFCs, proposals, or spec-first major changes.
- Non-heavyweight work: ordinary bugs, one-package features, docs-only edits,
routine test work, small refactors, or normal issue execution even when
non-trivial.
- If the task is heavyweight work:
- load
major-task immediately
- treat
major-task as the source of truth for workflow and helper
selection
- do not quietly inflate this task flow into a research swarm
- If the task is not heavyweight work, classify task complexity before
implementation:
- Non-trivial task: multi-step work, research-heavy work, phased execution,
or anything likely to take more than a handful of tool calls.
- Trivial task: quick question, small edit, or other work that does not need
persistent working memory.
- If the task is non-trivial:
- load
planning-with-files before implementation
- use persistent planning files or the repo's equivalent planning structure so
progress survives context loss
- follow local repo overrides for where planning files live
- if the task touches published package code under
packages/, record in the
plan whether the active unreleased changeset must be updated before completion
- if the task changes
kitcn init -t templates or scaffold sources, record
that bun run fixtures:sync and bun run fixtures:check will be required
before completion
- If the task is testing or coverage work:
- restate it as test work, not generic feature work
- load
testing first and use that testing policy as the source of truth
- choose the smallest honest seam before loading
tdd
- If the task is program or batch work:
- restate the ordered scope and hard constraints
- do not treat the whole batch as one implementation unit
- default to completing the first slice cleanly unless the user explicitly asks
for a broader sweep
- GitHub issue rules:
- Resolve bare issue numbers like
#555 against the current repo with
gh repo view --json nameWithOwner -q '.nameWithOwner'.
- Fetch GitHub issues with:
gh issue view <number-or-url> --comments --json number,title,body,comments,labels,state,url
- Fetch GitHub PRs with
gh pr view ... --json.
- If the input is ambiguous and not clearly a GitHub issue token or URL, do not
guess.
- For any tracker source, restate for yourself:
- source type
- source id
- exact title
- task type: bug, feature, refactor, docs, review, investigation, testing, or
batch work
- expected outcome or acceptance criteria
- caveats, blockers, or missing information from the tracker
- likely files, routes, or packages affected
- whether there is a real browser surface to verify
- likely root-cause layer: call site, helper, abstraction seam, or public API
- Read repo instructions and nearby implementation patterns before editing.
- If the task changes code:
- if already on a branch clearly dedicated to this exact task, continue there
- treat a branch as "clearly dedicated" only when the branch name, tracker id,
or existing PR obviously matches the current task
- do not reuse an unrelated non-
main branch just because you are already on
it
- if the current branch is not clearly dedicated to this task, check out
main,
pull the latest main, then create a fresh repo-convention branch before
editing
- if the task has a tracker id, prefer a branch name that includes it:
- GitHub issue:
codex/555-short-slug
- Linear issue:
codex/LIN-123-short-slug
- in this repo, otherwise prefer
codex/<slug>
- run install or setup only when the repo or task actually needs it
- If the task does not change code, skip branch and setup noise.
- If anything important is still ambiguous after the source-of-truth pass and
nearby code reading, ask the user the smallest useful clarifying question.
Tracked Task Rules
Apply this section only when the task source is a tracker item.
GitHub
- Treat the GitHub issue as the source of truth.
- Use
gh for fetch and sync-back.
- If useful, rename the thread to
<issue-number> <issue-title>.
- If the task is code-changing, prefer a branch name that includes the issue
number.
- If the task changed code and reached a verified meaningful outcome, create or
update the PR before any issue comment unless blocked or the user said not to.
- If the task reaches a meaningful outcome and came from the issue, post a
concise issue comment after the PR exists unless blocked or the user said not
to.
Linear
- Keep the same fetch-first behavior as the dedicated Linear workflow.
- Read the issue, comments, and attachments before implementation.
- Keep comment-back QA-focused.
- Do not force browser proof unless the task actually has a browser surface.
Tracked Task Non-Rules
- Do not require PR creation for tracker tasks that did not change code, ended
blocked, or were purely investigative.
- Do not require browser screenshots for every tracked task.
- Do not require tracker comments for investigations that ended blocked or
inconclusive unless sync-back is useful.
Load Skills Only When Justified
planning-with-files
Use for any non-trivial task that needs persistent working memory, phased
execution, or likely more than a handful of tool calls.
Follow repo-specific overrides for where planning artifacts should live.
major-task
Use for architecture or public API redesign, benchmarking or scalability
work, framework comparison or migration analysis, major cross-package
refactors, or RFC and proposal work.
When it triggers during intake, it becomes the source of truth instead of
this file.
testing
Use when the task is primarily about tests, coverage, regression gaps, or
phase-based suite work.
Load it before tdd for testing programs.
tdd
Use for bugs and for feature work where behavior-level automated coverage is
sane.
For testing programs, load it only after choosing a concrete slice that
should be done test-first.
Skip for trivial docs, mechanical refactors, or painful UI-only plumbing.
learnings-researcher
Use for non-trivial work in a domain with documented solutions or when the
task smells like a repeated issue.
Do not load it for tiny isolated edits.
debug
Use when the failure mode is still fuzzy after the first repro pass or first
failing test.
video-transcripts
Use when a tracker issue, PR, comment thread, or attachment set includes any
video or screen recording that should become structured transcript evidence
before implementation.
ce:brainstorm
Use when requirements are still ambiguous after reading the source of truth
and nearby code.
framework-docs-researcher
Use when touching unfamiliar, version-sensitive, or unstable third-party APIs
after checking local clones and docs per AGENTS.
browser-use
Use only when there is a real browser surface to verify.
Require real browser proof only for browser or UI tasks.
agent-browser-issue
Use when browser automation is blocked by a likely reusable tool-side issue
that should be split into its own GitHub follow-up.
changeset
Use when verified work changes published package code under packages/ and
the repo expects release notes before completion.
In this repo, prefer updating the active unreleased .changeset/*.md draft
instead of creating a parallel changeset when one already exists.
git-commit-push-pr
Use when verified work changed code and should ship as a PR.
Create or update the PR before any tracker comment.
The PR description should be the exact current final handoff, not a rewritten
summary.
If the task changes again in a later iteration, update the PR description to
match the latest handoff.
ce-compound
Use only after verified, non-trivial work that produced reusable knowledge.
Never load it at the start.
ce-review, correctness-reviewer, maintainability-reviewer,
project-standards-reviewer, code-simplicity-reviewer
Use only for risky, large, user-facing, or architecture-sensitive changes.
agent-native-reviewer
Use only when the change touches .agents/**, .claude/**, AI/tooling
surfaces, commands, or user actions that an agent should also be able to
perform.
Execution Path
Bug
- Reproduce first if possible.
- If behavior-level coverage is sane, turn the repro into the smallest useful
automated test and watch it fail for the right reason.
- Find the real ownership boundary of the bug. Prefer fixing the abstraction,
contract, or API that caused it instead of patching each caller around it.
- If the best fix requires an API change, make it unless the task source or
repo constraints explicitly rule that out. Do not preserve a bad API just to
keep the diff small.
- If you intentionally choose a narrower fix, state why the broader
architecture fix was not worth shipping now.
- Re-run targeted checks.
- Re-run the browser flow only if the bug lives on a browser surface.
Feature
- Reduce the task to the smallest slice that fully satisfies the acceptance
criteria.
- Add behavior coverage when sane.
- Prefer the cleanest long-term design that fits the slice, not the quickest
bolt-on.
- If existing patterns are the reason the design is weak, improve the pattern
or API instead of copying it blindly.
- Verify the user-facing outcome.
Testing Or Coverage Work
- Use the testing policy as the rulebook before choosing seams or commands.
- Identify the smallest honest seam for the first hotspot or ordered slice.
- Add or deepen the narrowest useful tests instead of spraying smoke coverage.
- Verify with targeted test commands first.
- Use broader suite checks only when repo rules or change scope justify them.
Program Or Batch Work
- Respect the explicit order from the task source.
- Do not fan out across every listed slice immediately.
- Define what "done for this slice" means before implementation.
- Complete one slice at a time unless the user explicitly asks for broader
execution.
Refactor Or Chore
- Preserve behavior.
- Do not do fake TDD theater.
- If the task exposes a bad API or abstraction, fix that seam instead of
polishing around it.
- Run the narrowest regression checks plus the relevant build, typecheck, or
lint path for the touched area.
Docs Or Content
- Skip engineering ceremony.
- Verify links, examples, formatting, and rendered output as appropriate.
Review Or Investigation
- Read the relevant diff, files, and surrounding context first.
- For review tasks, report findings first, ordered by severity, with concrete
file references.
- For investigation tasks, identify the failure mode, probable cause, and next
action before changing code.
- Only implement changes if the user actually asked for them.
Verification
Keep verification mandatory but proportional.
- Run targeted tests for changed behavior.
- Run package or app build and typecheck when relevant to the touched area.
- Run lint when code changed and the repo expects it.
- Run browser verification only for browser or UI tasks.
- Run broader repo-wide gates only when repo instructions require them or the
change scope justifies them.
- If a failing
bun test, bun check, or bun typecheck shows
local-corruption signals unrelated to the current diff, run bun install once
and rerun the exact failing command before declaring the task blocked. Treat
these as local-corruption signals:
Invalid hook call
resolveDispatcher() / null dispatcher crashes
- package-local
node_modules/react or node_modules/react-dom paths under
packages/*
- mixed
.bun and .pnpm React paths in the same failing stack
- missing-module or package-resolution garbage that does not match the diff
- If the repo has a standard final gate, run it last.
- If verified work changes published package code under
packages/, ensure the
active unreleased changeset is updated or a new one exists before PR or final
handoff.
- If verified work changes package code under
packages/kitcn, run
bun --cwd packages/kitcn build.
- If the work changes
kitcn init -t templates or scaffold sources, run
bun run fixtures:sync and bun run fixtures:check.
- If verified work changed code, create or update the PR before tracker
sync-back unless the user explicitly said not to.
- If the task came from a tracked issue and the task reached a meaningful
outcome, sync back unless the user said not to.
- If UI changed, capture proof from the real browser surface.
- Do not hardcode PR creation, screenshots, or tracker comments for every task.
Final Handoff
Every final response must include:
- follow repo writing style here too:
- be extremely concise
- sacrifice grammar for concision
- no filler, no narration, no polite padding
- leading flat metadata lines in this exact order:
🔀 PR ...
🐛 Fixes ... for bug issues, otherwise use a generic issue line
🟢 95-100% confidence
- then the flow table in this exact format:
| Phase | 🧪 Tests | 🌐 Browser |
| --- | --- | --- |
- use these flow rows, in this order:
- use markdown links for
PR and Issue when they exist
- for non-bug tracked work, do not fake a bug label; use a neutral issue line
instead
- keep the issue line stable; do not add issue comment links there
- use these exact status values in the flow table:
✅
❌
➖ N/A
- prefer specific browser caveat text over vague
⚠️ Partial, for example
⚠️ Could not automate dropdown
- in the
🧪 Tests column:
- use
🔴 in Reproduced when there was a real failing test first
- use
🟢 in Verified when that test passed after the fix
- use
✅ when test evidence exists but did not follow a real red-green loop
- use
➖ N/A for rows or cells that do not apply; do not invent a PR, issue,
or comment
- flow-table test cells mean test-based evidence, whether that came from TDD, a
regression test, or another targeted test path
- if manual non-browser reproduction or verification happened, explain it in
the prose below the metadata lines + flow table rather than adding extra rows
Confidence must stay 100% or lower and use this line format:
🟢 95-100% confidence
🟡 80-94% confidence
🔴 below 80% confidence
- after the metadata lines + flow table, use these short sections in this
order:
**🌐 Browser Check**, only when browser verification applies
**✅ Outcome**
**⚠️ Caveat**
**🏗️ Design**
**🧪 Verified**
- keep those sections flat, concise, and easy to scan
- keep prose brutally short; prefer bullets over paragraphs here
- if browser verification applies,
**🌐 Browser Check** must include the exact
human steps to verify the fix in the browser
**🏗️ Design** is mandatory for any non-trivial code-changing task and must
include:
Chosen seam: ...
Why not quick patch: ...
Why not broader change: ... only if a broader API or abstraction change
was a real option
UI Or Browser Tasks
- Include at least one real browser proof screenshot in the final response.
- The screenshot must come from
browser-use or the real browser workflow used
for verification.
- When
**🌐 Browser Check** is present, put the screenshot immediately after
that section.
- Otherwise, put the screenshot immediately after the metadata lines + flow
table, before the completion summary.
- If no real browser proof exists, the task is not done unless the user
explicitly waived it.
- If
browser-use is blocked on a likely reusable tool-side issue and the
product task is still otherwise fixable, load agent-browser-issue.
- If that follow-up issue is opened, mention it in the caveat or handoff.
**🌐 Browser Check** must be a flat bullet list:
- keep it short and concrete
- prefer the exact local URL as a markdown link when one exists, for example
http://localhost:3000/...
- use the real page, route, and interaction names
- focus on how to prove the fix, not on implementation detail
Non-UI Tasks
- No screenshot is required.
- Omit
**🌐 Browser Check** when there is no browser surface.
- Keep the final response concise and evidence-based.
Testing Or Batch Work
- State which slice, bucket, or hotspot was completed.
- State what verification ran for that completed slice.
- State what remains queued by design.
- State any intentional deferral in generic terms only.
Review Or Investigation Tasks
- Lead with findings or conclusion, not process notes.
Post Back To Tracker
Apply this section only when the task came from a tracker item and reached a
meaningful outcome.
Pull Request
- When a PR exists, the PR description must match the exact current final
handoff from chat:
- same flow table
- same metadata lines, except omit the leading
🔀 PR ... line because the
PR page already identifies itself
- same screenshot when applicable
- same
**✅ Outcome**, **🏗️ Design**, **🧪 Verified**, **⚠️ Caveat**,
and **🌐 Browser Check** sections when applicable
- same caveats
- same human browser verification steps when applicable
- PR description follows the same writing style:
- extremely concise
- grammar can be sacrificed for concision
- no fluffy framing
- Do not publish a PR description with a dead local proof path.
- If screenshot proof is local, upload it first and only then write or update
the final PR body with the hosted GitHub URL.
- For a brand-new PR, if the hosted proof URL is not ready yet:
- create the PR with no proof image
- upload the image immediately
- replace the placeholder with the real hosted proof before handoff
- If the PR description includes a local image path for proof, do not leave it
that way on GitHub.
- Use
browser-use to upload the image through
the PR comment file input as a staging area, then replace the local proof
path in the PR body with the hosted GitHub attachment URL.
- Use the PR comment textarea only as staging:
- upload image
- read generated markdown or URL from the textarea
- clear the textarea
- do not submit the staging comment
- Do not spend time reloading the PR page just to verify hosted image rendering
unless the user explicitly asks.
- Do not compress, adapt, or rewrite the handoff for the PR description.
- If a later iteration changes the fix, evidence, certainty, caveats, or
verification, update the PR description again so it stays in sync.
- Update the PR description before writing the issue comment.
GitHub Issue
Example:
✅ Fixed in #123.
- Open the affected page.
- Follow the real user flow that triggered the bug.
- Confirm the fixed behavior in the browser.
- Manual QA is still useful if browser verification was partial.
Linear
- Keep proof local. Do not upload screenshots or other files to Linear.
- Post a concise issue comment using the Linear integration.
- Match the ticket style and write for QA, not developers.
- Keep the same terse style:
- extremely concise
- grammar can be sacrificed for concision
- avoid file names, tests, typecheck, lint, branch names, PR mechanics, and
screenshot paths
Blocked Or Inconclusive Tracker Work
- If the task was blocked before meaningful progress, either skip the comment or
post a short blocker note only if that helps the tracker owner.
Success Criteria
- Source-of-truth context was read first.
- Relevant local instructions and code patterns were read before editing.
- Tracker items were fetched and summarized correctly when provided.
- Video attachments or screen recordings were turned into normalized
<video-transcripts> evidence before implementation when tracker evidence
required it.
- Bare GitHub issues like
#555 were resolved against the current gh repo
instead of guessed.
- The chosen implementation fixed the highest-leverage seam available, not just
the nearest symptom.
- Code-changing tasks that did not already start on a branch clearly dedicated
to the same task checked out
main and pulled latest before branching.
- Non-trivial work loaded
planning-with-files or the repo-equivalent planning
workflow before implementation.
- Non-trivial package or scaffold work recorded release-artifact requirements in
the plan early: active unreleased changeset for published package work,
fixture sync/check for
kitcn init -t template or scaffold work.
- Testing work loaded the testing policy before implementation.
- Only the necessary skills were loaded.
- The implementation matched the task type instead of following a one-size-fits-all
ritual.
- Batch work did not sprawl across multiple slices without explicit
instruction.
- Verification matched the change scope.
- Final handoff matched the task type.
- Testing or batch handoff reported the completed slice, verification, and
remaining queue when relevant.
- Any tracker, browser, review, or compound follow-up was done only if actually
relevant.