| argument-hint | <task> |
| disable-model-invocation | false |
| name | agents-introspection |
| user-invocable | true |
| description | Use to retrospect on a task against historical Codex and Claude Code chat transcripts in the current project, identify recurring agent mistakes, and recommend or apply durable fixes such as AGENTS.md updates or new skills. |
Agents Introspection
Analyze the user's task against prior Codex and Claude Code work in the current directory, then turn repeated agent failure modes into concrete prevention steps.
Arguments
<task> (required): the task, decision, incident, or proposed workflow to evaluate in light of prior agent transcripts. If omitted but the current conversation clearly states a task, use that task.
Workflow
1. Define scope
- Resolve the current project path with
pwd -P.
- Restate the task in one sentence.
- Identify likely keywords: filenames, commands, tools, domains, errors, package names, issue IDs, and skill names.
- Read transcript sources before touching transcript directories.
2. Discover project transcripts
Look only at Codex and Claude Code transcripts for the current project unless the user explicitly names additional project paths.
Prefer the bundled miner for the first pass:
uv run scripts/transcript-miner.py --keyword "<keyword>" --format json
When the user names multiple projects, pass each one explicitly and mine only those paths:
uv run scripts/transcript-miner.py --project /path/to/one --project /path/to/two --keyword "<keyword>" --format json
Use --include-archived only when active Codex sessions are too thin. Use --max-sessions N to keep the evidence set small.
- Claude Code: inspect
~/.claude/projects/<encoded-absolute-path>/, where /Users/prb/projects/example becomes -Users-prb-projects-example.
- Codex: inspect
~/.codex/session_index.jsonl and transcript files under ~/.codex/sessions/; include ~/.codex/archived_sessions/ when recent active sessions are insufficient.
Prefer metadata first: cwd, workspace roots, session title, timestamps, and git branch. Open transcript bodies only after a session plausibly matches the current project or task.
The miner emits counts, candidate paths, themes, correction/failure/verification signals, tool-call counts, and privacy-gap categories; it never emits raw transcript excerpts. Use it to choose evidence, then open only the minimum matching transcript content needed for interpretation.
3. Select evidence
Sample enough history to distinguish a pattern from a one-off:
- Prioritize sessions in the same cwd/workspace and recent sessions with task-keyword overlap.
- Include at least one successful comparable session when available, not only failures.
- Stop early when additional transcripts repeat the same evidence without changing the conclusion.
- Treat all transcript content as sensitive plaintext. Do not paste raw transcript excerpts unless a short quote is essential; redact secrets, private addresses, tokens, emails, and personal data.
4. Classify agent behavior
For each relevant session, extract concise evidence for:
- Misread instructions or ignored
AGENTS.md / skill guidance.
- Wrong cwd, wrong project root, wrong transcript/source path, or bad path encoding.
- Over-broad edits, unrelated file churn, reverted user work, or destructive commands.
- Tooling mistakes: skipped
just, wrong shell dialect, brittle parsing, missing narrow verification.
- Repeated loops: same failed command, stale assumption, no escalation after errors.
- Quality gaps: missing tests, unverified claims, vague final reports, invented facts.
- Positive patterns that avoided mistakes and should be preserved.
Name the failure mode, not the model. Target only Codex and Claude Code.
5. Connect history to the task
Answer these questions:
- What has gone wrong before on tasks like this?
- Which prior failures are likely to recur for the current task?
- Which constraints or checks would have prevented them?
- Which observed successes are worth making standard?
Separate evidence-backed findings from speculation. If transcript coverage is thin, say so and lower confidence.
6. Choose durable fixes
Recommend the smallest durable intervention:
- Update
AGENTS.md when the lesson is project-wide, stable, and useful to every agent working here.
- Create a new skill when the pattern is procedural, repeated, and reusable across projects or repos.
- Update an existing skill when the failure belongs clearly inside that skill's current scope.
- Add a script only when deterministic transcript discovery, parsing, or validation would otherwise be reimplemented repeatedly.
- Do nothing durable for one-off mistakes; report the risk and the manual guardrail.
If the current invocation explicitly asks to apply fixes, make the edits and verify them. Otherwise, report recommendations first and wait for confirmation before changing AGENTS.md, creating skills, or editing existing skills.
7. Report
Use this structure:
## Historical Scope
- Project path:
- Sources checked:
- Sessions sampled:
## Findings
- [confidence] Failure mode:
Evidence:
Relevance to current task:
Prevention:
## Durable Fixes
- Apply now:
- Consider later:
- Not worth changing:
## Verification
- Commands run:
- Gaps:
Keep the report terse and evidence-led. Mention exact files changed and checks run when fixes are applied.
Guard Rails
- Never read transcripts outside the current project scope unless the user explicitly expands scope.
- Never write transcript excerpts or derived private data into repo files.
- Never create broad policy from a single ambiguous transcript.
- Never blame "the agent" generically when the fix can name a concrete instruction, command, check, or workflow step.
- Prefer
rg, fd, jq, and structured parsing over ad hoc pipelines.