Use when multiple independent read-only research tasks can run simultaneously, or when you need to fan out investigation across many files or hypotheses.

2026-06-070

honesty

JPEGtheDev/story-to-ship

Use when communication quality or trust is in question. Always active -- applies to every session, every turn, every task.

2026-06-070

session-postmortem

JPEGtheDev/story-to-ship

Use when a completed session needs behavioral retrospective analysis.

2026-06-070

subagent-driven-development

JPEGtheDev/story-to-ship

Use when delegating implementation tasks, confirming theories, running parallel research, or reviewing completed work.

2026-06-070

using-git-worktrees

JPEGtheDev/story-to-ship

Use when running parallel agent work, testing an approach in isolation, or keeping the main branch clean while a subagent operates on a separate branch.

2026-06-070

three-amigos

JPEGtheDev/story-to-ship

Use when a feature has new or unclear acceptance criteria, a plan has 2+ todos and Discovery ran, or an implementer signals BLOCKED or DONE_WITH_CONCERNS.

2026-06-060

name	self-evaluation
license	MIT
description	Use when completing any session.

Iron Law

YOU MUST END EVERY SESSION WITH SELF-EVALUATION.
No exceptions.

Violating the letter of this rule is violating the spirit of this rule.

Announce at start: "I am using the self-evaluation skill for this session."

BEFORE PROCEEDING

The session's primary task is complete -- no further implementation work is planned for this session.
All commits from this session have been pushed or are staged and ready.
You have NOT yet sent the final message to the user.

[+] All met -> proceed through all 8 steps in order [-] Any unmet -> complete remaining session work first; once no further work is planned, return here and execute all 8 steps

Instructions for Agent

How This Skill is Invoked

This skill is mandatory -- AGENTS.md sec. Session Lifecycle requires it before every final message. You will also be invoked:

When explicitly asked: "Run self-evaluation", "What did you learn?", "Improve skills"
After addressing code review feedback that reveals a recurring pattern

Core Principle: Learn From Every Session

Every session produces insights that can improve future agent effectiveness. Capture these systematically.

Objectivity block: Agents and humans are structurally poor at evaluating their own work. This is not a character flaw -- it is a known bias. The explicit steps below exist to override it. Skipping steps because "it went well" is the bias asserting itself.

Step 1: Review the Session

Examine what happened during this session:

Code review feedback received -- What patterns did reviewers catch?
Mistakes made and corrected -- What errors occurred during implementation?
Owner/user feedback -- What preferences or corrections did the user provide?
Patterns discovered -- What reusable patterns emerged from the work?

Step 2: Categorize Lessons

Classify each lesson into one of these categories. For the full routing table with examples, see references/LESSONS_LEARNED_PATTERNS.md.

Category	Update Target
Code quality	`code-quality` skill
Testing	`testing` skill
CI/CD	`workflow` skill
Documentation	`documentation` skill
Build	`build` skill
Versioning	`versioning` skill

Step 3: Check Against Existing Knowledge

Before proposing updates, verify the lesson is not already documented:

Check AGENTS.md -- Is this pattern already listed?
Check the relevant skill's SKILL.md -- Is this rule already stated?
Check skill references/ -- Is there already an example?

Only propose additions for genuinely new or underemphasized patterns.

Step 4: Propose Skill Updates

For each new lesson, propose a specific, minimal update:

Format for Proposed Updates

**Lesson:** [One-line description]
**Source:** [PR #, review comment, or error that revealed this]
**Category:** [From Step 2 table]
**Target File:** [Exact file path to update]
**Proposed Change:** [Specific text to add or modify]
**Priority:** [High = caused bugs/rework, Medium = improved quality, Low = nice to have]

Prioritization Rules

High priority: Patterns that caused bugs, security issues, or significant rework
Medium priority: Patterns that improve code quality or developer experience
Low priority: Style preferences or minor improvements

Only implement High and Medium priority changes. Document Low priority for future reference.

Step 5: Trust Audit

Before generating the session summary, complete this audit:

Ask explicitly for each:

Did I make any false confidence claims?
- Used "should work", "done", "tests pass", or "I'm confident" without inline evidence?
- Each instance is a trust withdrawal. Name them.
Did I show evidence inline, or reference it?
- "I ran the tests and they passed" = referenced (does not count)
- "Ran tests: 247 passed, 0 failures. [exit 0]" = inline (counts)
Did I present any assumptions as facts?
- Claimed something about code behavior without reading the code or running the command?
What is the trust balance for this session?
- More deposits than withdrawals = trust maintained
- Any withdrawals = note them; they cost speed in future sessions
Decision accountability:
- Did any decisions made during this session produce poor outcomes?
- For each: name the decision, the evidence it was wrong, and the decision rule that MUST change.
- "I followed the established pattern" is not accountability if the pattern was wrong.
- Decisions are accountable regardless of whether they followed a rule. Rule adherence and outcome quality are separate.
Consistency check (integrity under low scrutiny):
- Did I apply the same gates in low-visibility moments (single-line changes, quick responses, perceived-trivial tasks) as in high-visibility ones?
- If I skipped any gate because the task "seemed small," name it. Consistency under low scrutiny is the definition of integrity.

Report honestly. If you made false confidence claims, name them. This is not a punishment -- it is the calibration mechanism. A model that accurately reports its own false confidence claims is more trustworthy than one that doesn't.

Step 6: Apply Updates (If Appropriate)

If changes are warranted and the session scope allows:

Update the relevant skill -- Add the lesson to the appropriate section
Increment the skill version -- Bump the version in YAML frontmatter
Keep changes minimal -- Add only what's necessary, don't restructure
Maintain skill boundaries -- Don't duplicate content across skills

Step 7: Write Findings to Disk

Before generating the session summary block, write the full findings to self-assessment.md in the repo root.

A self-evaluation that exists only in the message stream is not a self-evaluation -- it is ephemeral. The external postmortem reviewer reads from disk, not from the message stream. If the file does not exist, the external reviewer cannot cross-check the self-assessment against what was claimed.

Write Gate:

Produce the ### Session Self-Evaluation block (using the template in Step 8)
Write it to self-assessment.md in the repo root:
- If file does not exist: write directly (Write tool).
- If file already exists from a prior session: do NOT read the file before appending. Construct the new section heading from memory using today's date and session ID (first 8 characters of session UUID) in the format: ## Session Self-Evaluation (YYYY-MM-DD -- [8-char-session-id]). Use shell append (>>) to add content without reading existing content. Reading prior session content before writing allows prior-session framing to contaminate this session's evaluation.
[+] File written -> proceed to Step 8
[-] File not written -> STOP. Write the file before sending any final message.

Lifecycle: self-assessment.md is listed in .gitignore. It is a local session artifact -- never committed.

Step 8: Generate Session Summary

Include the ### Session Self-Evaluation block in the final message to the user (this is the same content written to disk in Step 7):

### Session Self-Evaluation

**Lessons Captured:** [count]
**Skills Updated:** [list of skills modified, or "None"]
**Key Patterns Added:**
- [Pattern 1: brief description]
- [Pattern 2: brief description]

**Trust Audit:**
- False confidence claims: [count + what they were, or "None"]
- Evidence shown inline: [yes/mostly/no]
- Trust balance: [positive/neutral/negative]

**Deferred (Low Priority):**
- [Pattern that was noted but not implemented]

Anti-Patterns to Avoid

Don't update skills for one-off situations -- Only add patterns that are likely to recur
Don't duplicate across skills -- Each lesson goes in exactly one place
Don't restructure existing skills -- Add to existing sections, don't reorganize
Don't add lessons that are standard programming practice -- Focus on project-specific patterns
Don't forget to check existing docs first -- Avoid adding what's already there

Rationalization Prevention

Excuse	Reality
"I didn't make any mistakes, no need to evaluate"	Every session has lessons. No lessons found = evaluation wasn't thorough enough.
"I'll do the self-evaluation next session"	Lessons evaporate overnight. Capture them now while the context is live.
"The user seemed satisfied, so the session went well"	User satisfaction != no lessons. Look for near-misses, slow spots, and subtle errors.
"Self-evaluation is for big failures only"	Small improvements compound. Consistent small lessons beat occasional big ones.
"I already updated one skill -- that's enough"	Evaluate all active domains. One skill update is rarely complete coverage.
"There's no time -- the session is over"	5 minutes of self-evaluation saves hours in future sessions. Make time.

Red Flags -- STOP

If you catch yourself thinking any of these, stop and follow the rule:

About to send a final message without having read the self-evaluation skill
"No lessons learned this session"
"The user is waiting, I'll skip self-eval"
Updated code but haven't checked if any skills are now stale
"I already know what I'd write -- no need to actually write it"
Closing a session without the Session Self-Evaluation block in the final message
Closing self-evaluation without writing findings to self-assessment.md on disk (Step 7 gate)

All of these mean: Load the self-evaluation skill and complete all 8 steps. Write to self-assessment.md (Step 7). Then include the ### Session Self-Evaluation block in the final message (Step 8).

References

Lesson examples and routing table: references/LESSONS_LEARNED_PATTERNS.md
Why structural mechanisms beat "try harder" (Objectivity Block rationale): references/LESSONS_LEARNED_PATTERNS.md