一键在 Manus 中运行任何 Skill

agents-introspection

星标63

分支3

更新时间2026年6月22日 09:50

Use to retrospect on a task against historical Codex and Claude Code chat transcripts in the current project, identify recurring agent mistakes, and recommend or apply durable fixes such as AGENTS.md updates or new skills.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

PaulRBerg

PaulRBerg/agent-skills

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

文件资源管理器

4 个文件

SKILL.md

readonly

argument-hint	<task>
disable-model-invocation	false
name	agents-introspection
user-invocable	true
description	Use to retrospect on a task against historical Codex and Claude Code chat transcripts in the current project, identify recurring agent mistakes, and recommend or apply durable fixes such as AGENTS.md updates or new skills.

Agents Introspection

Analyze the user's task against prior Codex and Claude Code work in the current directory, then turn repeated agent failure modes into concrete prevention steps.

Arguments

<task> (required): the task, decision, incident, or proposed workflow to evaluate in light of prior agent transcripts. If omitted but the current conversation clearly states a task, use that task.

Workflow

1. Define scope

Resolve the current project path with pwd -P.
Restate the task in one sentence.
Identify likely keywords: filenames, commands, tools, domains, errors, package names, issue IDs, and skill names.
Read transcript sources before touching transcript directories.

2. Discover project transcripts

Look only at Codex and Claude Code transcripts for the current project unless the user explicitly names additional project paths.

Prefer the bundled miner for the first pass:

uv run scripts/transcript-miner.py --keyword "<keyword>" --format json

When the user names multiple projects, pass each one explicitly and mine only those paths:

uv run scripts/transcript-miner.py --project /path/to/one --project /path/to/two --keyword "<keyword>" --format json

Use --include-archived only when active Codex sessions are too thin. Use --max-sessions N to keep the evidence set small.

Claude Code: inspect ~/.claude/projects/<encoded-absolute-path>/, where /Users/prb/projects/example becomes -Users-prb-projects-example.
Codex: inspect ~/.codex/session_index.jsonl and transcript files under ~/.codex/sessions/; include ~/.codex/archived_sessions/ when recent active sessions are insufficient.

Prefer metadata first: cwd, workspace roots, session title, timestamps, and git branch. Open transcript bodies only after a session plausibly matches the current project or task. The miner emits counts, candidate paths, themes, correction/failure/verification signals, tool-call counts, and privacy-gap categories; it never emits raw transcript excerpts. Use it to choose evidence, then open only the minimum matching transcript content needed for interpretation.

3. Select evidence

Sample enough history to distinguish a pattern from a one-off:

Prioritize sessions in the same cwd/workspace and recent sessions with task-keyword overlap.
Include at least one successful comparable session when available, not only failures.
Stop early when additional transcripts repeat the same evidence without changing the conclusion.
Treat all transcript content as sensitive plaintext. Do not paste raw transcript excerpts unless a short quote is essential; redact secrets, private addresses, tokens, emails, and personal data.

4. Classify agent behavior

For each relevant session, extract concise evidence for:

Misread instructions or ignored AGENTS.md / skill guidance.
Wrong cwd, wrong project root, wrong transcript/source path, or bad path encoding.
Over-broad edits, unrelated file churn, reverted user work, or destructive commands.
Tooling mistakes: skipped just, wrong shell dialect, brittle parsing, missing narrow verification.
Repeated loops: same failed command, stale assumption, no escalation after errors.
Quality gaps: missing tests, unverified claims, vague final reports, invented facts.
Positive patterns that avoided mistakes and should be preserved.

Name the failure mode, not the model. Target only Codex and Claude Code.

5. Connect history to the task

Answer these questions:

What has gone wrong before on tasks like this?
Which prior failures are likely to recur for the current task?
Which constraints or checks would have prevented them?
Which observed successes are worth making standard?

Separate evidence-backed findings from speculation. If transcript coverage is thin, say so and lower confidence.

6. Choose durable fixes

Recommend the smallest durable intervention:

Update AGENTS.md when the lesson is project-wide, stable, and useful to every agent working here.
Create a new skill when the pattern is procedural, repeated, and reusable across projects or repos.
Update an existing skill when the failure belongs clearly inside that skill's current scope.
Add a script only when deterministic transcript discovery, parsing, or validation would otherwise be reimplemented repeatedly.
Do nothing durable for one-off mistakes; report the risk and the manual guardrail.

If the current invocation explicitly asks to apply fixes, make the edits and verify them. Otherwise, report recommendations first and wait for confirmation before changing AGENTS.md, creating skills, or editing existing skills.

7. Report

Use this structure:

## Historical Scope

- Project path:
- Sources checked:
- Sessions sampled:

## Findings

- [confidence] Failure mode:
  Evidence:
  Relevance to current task:
  Prevention:

## Durable Fixes

- Apply now:
- Consider later:
- Not worth changing:

## Verification

- Commands run:
- Gaps:

Keep the report terse and evidence-led. Mention exact files changed and checks run when fixes are applied.

Guard Rails

Never read transcripts outside the current project scope unless the user explicitly expands scope.
Never write transcript excerpts or derived private data into repo files.
Never create broad policy from a single ambiguous transcript.
Never blame "the agent" generically when the fix can name a concrete instruction, command, check, or workflow step.
Prefer rg, fd, jq, and structured parsing over ad hoc pipelines.

同仓库更多 Skills

同仓库

todo-archive

PaulRBerg/agent-skills

Use only when explicitly asked to archive/prune/compact/roll over checked tasks from TODO.md into `.ai/todos/TODO_UNTIL_YYYY_MM_DD.md`, leaving unchecked tasks.

2026-06-2663

commit

PaulRBerg/agent-skills

Use only when explicitly invoked for Git commit workflows: stage intended changes, craft Conventional Prefix Format messages by default, Natural Language messages with --natural or configured repos, commit, and optionally --all, --staged, --deep, --close, or --push.

2026-06-2663

bump-deps

PaulRBerg/agent-skills

Use for dependency updates: update/bump deps, npm/pnpm/yarn/bun package upgrades, outdated checks, package.json updates, or taze.

2026-06-2463

bump-release

PaulRBerg/agent-skills

Use for release versioning: bump/cut/tag a release, bump version, create a release, changelog updates, or version tagging.

2026-06-2463

yeet

PaulRBerg/agent-skills

Use for GitHub PR/issue/discussion workflows: create/update PRs or issues, post comments, start discussions; triggers include create/open PR, file/update issue, yeet.

2026-06-2363

code-polish

PaulRBerg/agent-skills

Use to polish recently changed code: simplify for readability/maintainability and run a risk-profiled review that autonomously applies fixes. Default runs both passes; pass --simplify or --review for one. Covers code/PR review, audits, bug/security checks, reviewing diffs or changes, cleanup, refactoring, and reducing complexity.

2026-06-2263

argument-hint	<task>
disable-model-invocation	false
name	agents-introspection
user-invocable	true
description	Use to retrospect on a task against historical Codex and Claude Code chat transcripts in the current project, identify recurring agent mistakes, and recommend or apply durable fixes such as AGENTS.md updates or new skills.

Agents Introspection

Analyze the user's task against prior Codex and Claude Code work in the current directory, then turn repeated agent failure modes into concrete prevention steps.

Arguments

<task> (required): the task, decision, incident, or proposed workflow to evaluate in light of prior agent transcripts. If omitted but the current conversation clearly states a task, use that task.

Workflow

1. Define scope

Resolve the current project path with pwd -P.
Restate the task in one sentence.
Identify likely keywords: filenames, commands, tools, domains, errors, package names, issue IDs, and skill names.
Read transcript sources before touching transcript directories.

2. Discover project transcripts

Look only at Codex and Claude Code transcripts for the current project unless the user explicitly names additional project paths.

Prefer the bundled miner for the first pass:

uv run scripts/transcript-miner.py --keyword "<keyword>" --format json

When the user names multiple projects, pass each one explicitly and mine only those paths:

uv run scripts/transcript-miner.py --project /path/to/one --project /path/to/two --keyword "<keyword>" --format json

Use --include-archived only when active Codex sessions are too thin. Use --max-sessions N to keep the evidence set small.

Claude Code: inspect ~/.claude/projects/<encoded-absolute-path>/, where /Users/prb/projects/example becomes -Users-prb-projects-example.
Codex: inspect ~/.codex/session_index.jsonl and transcript files under ~/.codex/sessions/; include ~/.codex/archived_sessions/ when recent active sessions are insufficient.

3. Select evidence

Sample enough history to distinguish a pattern from a one-off:

Prioritize sessions in the same cwd/workspace and recent sessions with task-keyword overlap.
Include at least one successful comparable session when available, not only failures.
Stop early when additional transcripts repeat the same evidence without changing the conclusion.
Treat all transcript content as sensitive plaintext. Do not paste raw transcript excerpts unless a short quote is essential; redact secrets, private addresses, tokens, emails, and personal data.

4. Classify agent behavior

For each relevant session, extract concise evidence for:

Misread instructions or ignored AGENTS.md / skill guidance.
Wrong cwd, wrong project root, wrong transcript/source path, or bad path encoding.
Over-broad edits, unrelated file churn, reverted user work, or destructive commands.
Tooling mistakes: skipped just, wrong shell dialect, brittle parsing, missing narrow verification.
Repeated loops: same failed command, stale assumption, no escalation after errors.
Quality gaps: missing tests, unverified claims, vague final reports, invented facts.
Positive patterns that avoided mistakes and should be preserved.

Name the failure mode, not the model. Target only Codex and Claude Code.

5. Connect history to the task

Answer these questions:

What has gone wrong before on tasks like this?
Which prior failures are likely to recur for the current task?
Which constraints or checks would have prevented them?
Which observed successes are worth making standard?

Separate evidence-backed findings from speculation. If transcript coverage is thin, say so and lower confidence.

6. Choose durable fixes

Recommend the smallest durable intervention:

Update AGENTS.md when the lesson is project-wide, stable, and useful to every agent working here.
Create a new skill when the pattern is procedural, repeated, and reusable across projects or repos.
Update an existing skill when the failure belongs clearly inside that skill's current scope.
Add a script only when deterministic transcript discovery, parsing, or validation would otherwise be reimplemented repeatedly.
Do nothing durable for one-off mistakes; report the risk and the manual guardrail.

7. Report

Use this structure:

## Historical Scope

- Project path:
- Sources checked:
- Sessions sampled:

## Findings

- [confidence] Failure mode:
  Evidence:
  Relevance to current task:
  Prevention:

## Durable Fixes

- Apply now:
- Consider later:
- Not worth changing:

## Verification

- Commands run:
- Gaps:

Keep the report terse and evidence-led. Mention exact files changed and checks run when fixes are applied.

Guard Rails

Never read transcripts outside the current project scope unless the user explicitly expands scope.
Never write transcript excerpts or derived private data into repo files.
Never create broad policy from a single ambiguous transcript.
Never blame "the agent" generically when the fix can name a concrete instruction, command, check, or workflow step.
Prefer rg, fd, jq, and structured parsing over ad hoc pipelines.