con un clic
self-evaluation
Use when completing any session.
Menú
Use when completing any session.
Use when multiple independent read-only research tasks can run simultaneously, or when you need to fan out investigation across many files or hypotheses.
Use when communication quality or trust is in question. Always active -- applies to every session, every turn, every task.
Use when a completed session needs behavioral retrospective analysis.
Use when delegating implementation tasks, confirming theories, running parallel research, or reviewing completed work.
Use when running parallel agent work, testing an approach in isolation, or keeping the main branch clean while a subagent operates on a separate branch.
Use when a feature has new or unclear acceptance criteria, a plan has 2+ todos and Discovery ran, or an implementer signals BLOCKED or DONE_WITH_CONCERNS.
| name | self-evaluation |
| license | MIT |
| description | Use when completing any session. |
YOU MUST END EVERY SESSION WITH SELF-EVALUATION.
No exceptions.
Violating the letter of this rule is violating the spirit of this rule.
Announce at start: "I am using the self-evaluation skill for this session."
[+] All met -> proceed through all 8 steps in order [-] Any unmet -> complete remaining session work first; once no further work is planned, return here and execute all 8 steps
This skill is mandatory -- AGENTS.md sec. Session Lifecycle requires it before every final message. You will also be invoked:
Every session produces insights that can improve future agent effectiveness. Capture these systematically.
Objectivity block: Agents and humans are structurally poor at evaluating their own work. This is not a character flaw -- it is a known bias. The explicit steps below exist to override it. Skipping steps because "it went well" is the bias asserting itself.
Examine what happened during this session:
Classify each lesson into one of these categories. For the full routing table with examples, see references/LESSONS_LEARNED_PATTERNS.md.
| Category | Update Target |
|---|---|
| Code quality | code-quality skill |
| Testing | testing skill |
| CI/CD | workflow skill |
| Documentation | documentation skill |
| Build | build skill |
| Versioning | versioning skill |
Before proposing updates, verify the lesson is not already documented:
AGENTS.md -- Is this pattern already listed?SKILL.md -- Is this rule already stated?references/ -- Is there already an example?Only propose additions for genuinely new or underemphasized patterns.
For each new lesson, propose a specific, minimal update:
**Lesson:** [One-line description]
**Source:** [PR #, review comment, or error that revealed this]
**Category:** [From Step 2 table]
**Target File:** [Exact file path to update]
**Proposed Change:** [Specific text to add or modify]
**Priority:** [High = caused bugs/rework, Medium = improved quality, Low = nice to have]
Only implement High and Medium priority changes. Document Low priority for future reference.
Before generating the session summary, complete this audit:
Ask explicitly for each:
Did I make any false confidence claims?
Did I show evidence inline, or reference it?
Did I present any assumptions as facts?
What is the trust balance for this session?
Decision accountability:
Consistency check (integrity under low scrutiny):
Report honestly. If you made false confidence claims, name them. This is not a punishment -- it is the calibration mechanism. A model that accurately reports its own false confidence claims is more trustworthy than one that doesn't.
If changes are warranted and the session scope allows:
Before generating the session summary block, write the full findings to self-assessment.md in the repo root.
A self-evaluation that exists only in the message stream is not a self-evaluation -- it is ephemeral. The external postmortem reviewer reads from disk, not from the message stream. If the file does not exist, the external reviewer cannot cross-check the self-assessment against what was claimed.
Write Gate:
### Session Self-Evaluation block (using the template in Step 8)self-assessment.md in the repo root:
## Session Self-Evaluation (YYYY-MM-DD -- [8-char-session-id]). Use shell append (>>) to add content without reading existing content. Reading prior session content before writing allows prior-session framing to contaminate this session's evaluation.Lifecycle: self-assessment.md is listed in .gitignore. It is a local session artifact -- never committed.
Include the ### Session Self-Evaluation block in the final message to the user (this is the same content written to disk in Step 7):
### Session Self-Evaluation
**Lessons Captured:** [count]
**Skills Updated:** [list of skills modified, or "None"]
**Key Patterns Added:**
- [Pattern 1: brief description]
- [Pattern 2: brief description]
**Trust Audit:**
- False confidence claims: [count + what they were, or "None"]
- Evidence shown inline: [yes/mostly/no]
- Trust balance: [positive/neutral/negative]
**Deferred (Low Priority):**
- [Pattern that was noted but not implemented]
| Excuse | Reality |
|---|---|
| "I didn't make any mistakes, no need to evaluate" | Every session has lessons. No lessons found = evaluation wasn't thorough enough. |
| "I'll do the self-evaluation next session" | Lessons evaporate overnight. Capture them now while the context is live. |
| "The user seemed satisfied, so the session went well" | User satisfaction != no lessons. Look for near-misses, slow spots, and subtle errors. |
| "Self-evaluation is for big failures only" | Small improvements compound. Consistent small lessons beat occasional big ones. |
| "I already updated one skill -- that's enough" | Evaluate all active domains. One skill update is rarely complete coverage. |
| "There's no time -- the session is over" | 5 minutes of self-evaluation saves hours in future sessions. Make time. |
If you catch yourself thinking any of these, stop and follow the rule:
self-assessment.md on disk (Step 7 gate)All of these mean: Load the self-evaluation skill and complete all 8 steps. Write to self-assessment.md (Step 7). Then include the ### Session Self-Evaluation block in the final message (Step 8).
references/LESSONS_LEARNED_PATTERNS.mdreferences/LESSONS_LEARNED_PATTERNS.md