Run any Skill in Manus with one click

$pwd:

retrospect

Name: Retrospect
Author: devseunggwan

// Session retrospect — analyze current Claude Code session against CLAUDE.md rules, identify friction patterns and root causes, propose context-appropriate improvement actions, then execute after user approval. Triggers on "retrospect", "what went wrong", "session review", "session improvement", "what was the issue", "improve".

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 6, 2026 at 05:39

SKILL.md

readonly

package.json

"author": "devseunggwan"

"repository": "devseunggwan/praxis"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	retrospect
description	Session retrospect — analyze current Claude Code session against CLAUDE.md rules, identify friction patterns and root causes, propose context-appropriate improvement actions, then execute after user approval. Triggers on "retrospect", "what went wrong", "session review", "session improvement", "what was the issue", "improve".

Retrospect

Overview

Repeated friction wastes cycles across sessions. Unexamined pain stays unresolved.

Core principle: ALWAYS analyze root cause before proposing any action. Symptom-level fixes (e.g., "remember to do X") miss the underlying pattern.

Pipeline: Load → Analyze → Report/Approve → Execute (4 stages)

Delegates to: OMC tracer agent (causal pattern analysis), analyst agent (pattern clustering)

The Iron Law

NO ACTION WITHOUT ROOT CAUSE ANALYSIS FIRST.
PATTERN ≠ ROOT CAUSE. SYMPTOM ≠ ROOT CAUSE.
REPEATED PATTERN + MEMORY = FAILED REMEDY. ESCALATE.
TRACER + ANALYST CALLS ARE MANDATORY, NOT OPTIONAL.

If you haven't completed Stage 2 (Analyze), you cannot propose actions. "It happened because X" is a symptom. "X happened because of missing rule Y / unclear trigger Z / absent skill W" is a root cause.

When to Use

Use at the END of a working session to extract learnings:

Session had repeated tool retries or direction changes
User gave corrections mid-session ("no, don't do that")
A task took significantly longer than expected
Workflow steps were skipped or out of order
User expressed frustration or redirected multiple times

Use this ESPECIALLY when:

The same mistake happened more than once in the session
You feel "I should have done that differently"
A rule in CLAUDE.md was violated — even once
A new workflow pattern emerged that isn't captured anywhere

The Four Stages

You MUST complete each stage before proceeding to the next.

Stage 1: Load Calibration Standard

Before scanning the conversation:

Read CLAUDE.md — load all rules, behavioral guidelines, and workflow requirements
- Global: $CLAUDE_CONFIG_DIR/CLAUDE.md
- Project: CLAUDE.md in cwd (if exists)
- Key sections: Mandatory Rules, Behavioral Rules, Workflow rules
Identify rule categories to scan against:
- Workflow discipline (Issue-Driven Workflow, Planning Before Implementation)
- Evidence-Based Delivery (No "Trust Me" completions)
- Atomic Commits + PR Lifecycle
- Mandatory Testing (unit + functional)
- Code Review Before Commit
- Error Recovery Before Asking
- Communication conventions
Set the calibration frame: For each rule category, form a question — e.g., "Did the session violate 'Planning Before Implementation'? Were there 3+ step tasks that skipped plan mode?"

Stage 2: Analyze Conversation

Pre-scan: Quick friction event identification — scan the conversation for up to 5 friction events (user corrections, retries, skipped steps, stalls) BEFORE calling agents. This provides the input for agent calls.

Pre-scan Categorization (Mandatory) — every friction event identified in pre-scan MUST be tagged with category[]: string[] containing ≥1 of these enumerated values. Stage 2 progression to step 3 is BLOCKED until every event has at least one category label.

Category	Signal examples (≥2 each)	Required `Tool Layer` (composition with step 4b)
`behavioral`	"Claude가 확인 없이 결론 도출" / "한 PR에 여러 concern bundle" / "user가 동일 지적 반복"	none (Tool Layer = `—`)
`tool`	"gh CLI `--state all` 부재" / "MCP 응답 지연" / "Read 출력 truncation" / "kubectl flag 부재"	mandatory: one of `mcp` / `cli` / `builtin` / `skill`
`workflow`	"test 안 돌리고 PR" / "verify 단계 건너뜀" / "issue 안 만들고 브랜치" / "code review 생략"	optional: `skill` (when defect originates inside a skill's stage flow)
`spec-gap`	"이 상황을 다루는 규칙 부재" / "SKILL.md trigger 모호" / "CLAUDE.md에 명시되지 않은 행동"	optional: `skill` (when rule gap is in a SKILL.md)

Layer E ↔ step 4b composition matrix (normative — referenced by step 4b and Stage 3 unified table):

A friction event MAY carry multiple categories (e.g., [workflow, tool] when a workflow step skip was caused by a tool flag bug).
When tool ∈ category[], the event MUST also be classified into one of the 4 step-4b layers (mcp / cli / builtin / skill); the Tool Layer cell of the unified table cannot remain —.
For behavioral only events, Tool Layer = —.
For workflow or spec-gap events without a tool root cause, Tool Layer = —; if the workflow defect or rule gap originates within a skill, Tool Layer MAY be set to skill to enable step 4b downstream routing.

Early exit: If pre-scan finds 0 friction events, skip agent calls and exit with "No patterns found. ✅" — do not call agents with empty input.

MANDATORY AGENT CALLS — when pre-scan finds 1+ friction events, MUST call sequentially (analyst depends on tracer output):

tracer agent (causal chain analysis) — call FIRST: Agent(subagent_type="oh-my-claudecode:tracer", model="sonnet")
- Input: friction events identified from pre-scan
- Output: causal chains with confidence scores
- Do NOT skip this call. "I can analyze this myself" is a Red Flag.
analyst agent (pattern clustering) — call AFTER tracer completes: Agent(subagent_type="oh-my-claudecode:analyst", model="sonnet")
- Input: friction events + tracer causal chains (from step 1)
- Output: clustered patterns with root causes

Then refine using agent outputs:

Scope: Scan the most recent 50 turns, or back to the last session boundary. Stop after identifying 5 distinct friction events — clustering (step 6) handles de-duplication. If session history is not accessible, use the user's verbal summary as input to steps 3–8.

Refine friction events with agent outputs — merge pre-scan events with tracer/analyst results:
- Add any new friction events the agents identified that pre-scan missed
- Update causal chains using tracer confidence scores
- Drop false positives that agents ruled out
- Final list: up to 5 distinct friction events with causal chains attached
Map each event to a CLAUDE.md rule (or gap):
- Read the event's category[] from pre-scan and feed it into rule-mapping
- Which rule was applicable?
- Was it followed, violated, or simply absent?
- Quote or paraphrase the specific moment
- If category[] includes spec-gap, this map step often resolves as "rule absent" — fold that signal into step 5 root cause

4b. Tool Friction Pass — independently analyze tool/feature-level friction (cross-referenced by Layer E composition matrix above):

This pass runs SEPARATELY from step 4. A friction event may match a rule violation (step 4) AND a tool defect (step 4b) — both are recorded. Per the Layer E composition matrix, every event with tool ∈ category[] MUST be classified into one of the 4 layers below; the unified-table Tool Layer cell of such an event cannot remain —.

Tool layers to scan (all 4):

Layer	Examples	Friction signals
`mcp`	any custom or third-party MCP server (data warehouse, observability, chat, infra, etc.)	Slow response, missing field, schema mismatch, timeout
`cli`	`gh`, `kubectl`, `git`, plus project-specific CLIs	Missing flag/option, undocumented behavior, workaround needed
`builtin`	Read/Edit/Bash/Grep/Glob, Agent, hooks	Environmental constraint, permission issue, output truncation
`skill`	praxis / OMC / project-specific skills and subagents	Stage boundary unclear, trigger mismatch, prompt defect, wrong routing

For each tool friction event, record:

tool_name: specific tool (e.g., "gh CLI", " MCP", " skill")
layer: mcp / cli / builtin / skill
friction_type: missing feature, design defect, documentation gap, performance issue, integration mismatch
evidence: the specific moment (quote or paraphrase)
expected_behavior: what should have happened
proposed_fix_direction: brief suggestion for upstream improvement

Representative friction examples (for calibration):

"gh CLI의 --state all 플래그가 없어서 open/closed를 각각 호출해야 했다" → layer: cli, friction_type: missing feature
"MCP 응답 지연으로 3회 재시도 후 fallback 전략을 수동 구성했다" → layer: mcp, friction_type: performance issue
"skill의 Stage 경계가 불명확해서 step을 건너뛰고 다음 stage로 넘어갔다" → layer: skill, friction_type: design defect
"codex exec의 permission mode가 달라 파일 쓰기에 실패했다" → layer: cli, friction_type: integration mismatch
"Read 도구의 출력 truncation으로 파일 끝부분을 놓쳤다" → layer: builtin, friction_type: design defect

Dedup rule (step 4 vs step 4b):

If a friction event has BOTH a rule violation (step 4) AND a tool defect (step 4b), record it in BOTH places
Step 4 finding addresses the behavioral correction (what Claude should have done differently)
Step 4b finding addresses the tool improvement (what the tool should do differently)
The two findings may have different action types (e.g., step 4 → memory, step 4b → upstream feedback)

Find root cause for each pattern:

Symptom:   "Claude retried the same tool 3 times"
Pattern:   "Error recovery loop"
Root cause: "No diagnostic step between retries — violated Error Recovery Before Asking rule"

Symptom:   "Implementation started before plan was approved"
Pattern:   "Premature execution"
Root cause: "Task had 4 steps but plan mode was not entered — violated Planning Before Implementation"

Cluster patterns — are multiple events the same root cause? If 3+ events share a root cause → HIGH priority
Scan MEMORY.md for repeat patterns (2-hop deterministic scan): a. Read MEMORY.md index (single file read) — extract all feedback entry titles and file paths b. For each finding's root cause, identify candidate matches from index titles (concept-level, not keyword) c. Read each candidate feedback file to confirm semantic match (same root cause, not just similar keywords) d. Only mark repeat=true if root cause is semantically identical
- Example: "workflow skip" in index + "workflow violation" in finding = match
- Example: "commit" matching both "atomic commit" and "pre-commit hook" = NOT auto-match, read file to confirm e. repeat_count = number of distinct feedback files with matching root cause f. If match found with existing resolution action (issue/hook already created): mark as resolved=true

Auto-assign action type based on escalation ladder. Apply Repeat-based rows first; then apply Category-default rows below to override or compound when the event's category[] (from pre-scan, step 4) makes memory-only inappropriate even on first occurrence:

Condition	Action Type	Rationale
New pattern (structural root cause, likely to recur)	memory	First occurrence — capture for future reference
Repeat (in MEMORY.md, 1-2x)	GitHub issue	Memory alone failed — need systemic fix
Repeat (3x+)	hook or skill	Multiple memory entries = enforcement gap
Missing rule (new)	CLAUDE.md draft	No rule exists for this pattern
Missing rule + Repeat	CLAUDE.md draft + GitHub issue	Missing rule caused repeat — add rule + compliance issue
Tool friction (step 4b finding)	upstream feedback	Tool improvement needed — issue in the tool's backing repo (resolve via Stage 4 Action 4 routing table; not always praxis)
One-off mistake (situational cause, unlikely to recur)	note only	No persistent action needed
Category default — `tool`	upstream feedback (compound with memory only when behavioral co-label exists)	step 4b backing-repo resolution; tool defect is not a Claude behavior issue, memory alone insufficient
Category default — `workflow`	hook code OR skill idea (memory-alone NOT allowed even on first occurrence)	enforcement gap detected — workflow steps that get skipped need structural enforcement, not memo
Category default — `spec-gap`	CLAUDE.md draft OR skill idea (memory-alone NOT allowed)	rule absent — gaps are filled with rules, not memos
Category default — `behavioral`	memory (default; compound with skill_idea or CLAUDE.md draft when structural)	only category where memory-alone is acceptable; still subject to Gate-2 rationale schema in Stage 2.5

Distinguishing "New pattern" vs "One-off mistake":

New pattern: root cause is structural (missing rule, absent skill, unclear workflow) → likely to recur in future sessions
One-off mistake: root cause is situational (context loss, typo, unusual edge case) → unlikely to recur under normal conditions
When uncertain, default to memory (safer to capture than to miss)

⚠️ BLOCKED unless justified: If repeat=true, the action type CANNOT be memory.

Escape hatch: If repeat=true AND resolved=true (existing issue/hook resolution already exists for this feedback), note only is allowed. In this case, include a sentence in the report confirming that the existing resolution is still effective.

⚠️ MUST — backing_repo declaration for upstream_feedback rows: When Proposed Actions contains upstream_feedback (single or compound), the row's Rationale cell MUST include a backing_repo: <owner/repo> line resolved per Stage 4 Action 4's resolution table. The declaration is load-bearing — Stage 4 re-reads it as the routing decision and aborts on divergence.

Resolution source-of-truth (in priority order): plugin manifest repository field → MCP server git remote → dotfiles backing repo via symlink chain → project CLAUDE.md feature-to-repo mapping.
Format: literal line backing_repo: <owner>/<repo> embedded in the Rationale cell via   separators. Example: Rationale: tool defect in praxis distribution backing_repo: devseunggwan/praxis.
Unresolvable layer (builtin, or no upstream reachable per the Action 4 resolution table's builtin row): remove upstream_feedback from the row's Proposed Actions set entirely. If the row had compound actions (e.g., memory, upstream_feedback), retain the remaining ones (e.g., keep memory alone). If upstream_feedback was the sole action, re-derive the action via Stage 2 step 8's category-default rows (typically skill_idea for tool category, or memory if behavioral co-label exists). The escape-hatch state note only (from repeat=true AND resolved=true) is a separate construct and is NOT used here. Do NOT emit a placeholder backing_repo.
Ambiguous layer (resolution table's Other / ambiguous row): keep upstream_feedback but surface to user immediately at Stage 2; the user-supplied repo becomes the declared backing_repo. Stage 4 step 0 then re-resolves and may still divergence-prompt if the live re-resolution differs.
Hook-parsing safety: this is not a memory-only row, so the Stage 2.5 Gate-2 5-line schema does not apply. The backing_repo: line lives alongside the human rationale text without conflicting with the Gate-2 regex.

Stage 2.5: Action Distribution Audit

After Stage 2 completes (all findings have category[] labels and provisional Proposed Actions) and BEFORE Stage 3 begins, run the two gate checks below. Each finding has its own per-finding gate counter, reset at Stage 2.5 entry.

Gate-1 (Categorical) — for each finding whose category[] intersects {tool, workflow, spec-gap}, verify Proposed Actions ≠ memory (single, not compound).

If memory is the only action for such a finding → return that finding to Stage 2 step 4 (re-evaluate label correctness — Gate-1 violations are most often mislabeling: the event was actually behavioral but got tagged tool/workflow/spec-gap, or vice versa) AND step 8 (re-derive action with category-default rows applied).

Gate-2 (Procedural) — for each finding with Proposed Actions = memory (single, not compound, regardless of category), verify the Rationale cell contains EXACTLY 5 lines, each matching the regex ^not (issue|claude_md_draft|skill_idea|hook_code|upstream_feedback): .+$. The 5 lines MUST cover the 5 non-memory action types (no duplicates, no missing keys).

If absent or incomplete → return that finding to Stage 2 step 8 (re-evaluate with explicit per-action rationale enforcement).

Per-finding loop cap — maximum 2 re-entries per finding. On the 3rd violation for the same finding, surface to user with explicit prompt:

"Finding #N은 진짜 memory-only가 적합한가요? Gate-1/Gate-2가 통과되지 않습니다. rationale을 직접 입력해주시면 우회합니다."

User-supplied rationale is logged but bypasses the gate for that single finding.

Behavioral-only safeguard — if ALL findings ended up labeled behavioral only (no tool/workflow/spec-gap anywhere), run a final keyword sanity check on the original pre-scan signal text. If any signal contains gh / kubectl / MCP / --state / permission denied / timeout / --help / flag, surface to user:

"모든 finding이 behavioral로 분류되었으나 pre-scan 신호 텍스트에 도구 키워드(gh, MCP, ...)가 발견됨. 라벨이 정확한지 재검토 필요."

User confirmation required to proceed; if user confirms, log the keyword set found.

Output (on pass) — Stage 2.5 emits the distribution card per the Output Schema Contract defined in Stage 3, plus per-finding Gate-1 and Gate-2 verdicts:

<!-- retrospect:distribution begin -->
- memory: {n}
- issue: {n}
- claude_md_draft: {n}
- skill_idea: {n}
- hook_code: {n}
- upstream_feedback: {n}
- gate_1_verdict: {PASS|FAIL|NA}
- gate_2_verdict: {PASS|FAIL|NA}
<!-- retrospect:distribution end -->

NA = no findings of the relevant type (e.g., gate_1_verdict: NA when zero tool/workflow/spec-gap labeled findings exist).

This card and verdict block become Stage 3's input header.

Stage 3: Report + Approval

Output Schema Contract (normative — Stop hook retrospect-mix-check.sh parses this):

Stage 3 output MUST emit, in this order:

Header: a line matching ^## Retrospect Report (em-dash or hyphen tail accepted: ## Retrospect Report — {date} or ## Retrospect Report - {date}).

Distribution card between HTML comment fences. Action keys are canonical snake_case enum; verdict values are PASS / FAIL / NA:

<!-- AUTHORITATIVE_SCHEMA — Stop hook depends on this. Co-update hooks/retrospect-mix-check.sh + tests/test_retrospect_mix_check.sh + tests/fixtures/retrospect-synth-*.expected.json on any change to this fence or the action key set. -->
<!-- retrospect:distribution begin -->
- memory: 1
- issue: 0
- claude_md_draft: 0
- skill_idea: 0
- hook_code: 0
- upstream_feedback: 0
- gate_1_verdict: PASS
- gate_2_verdict: PASS
<!-- retrospect:distribution end -->

Unified findings table with literal column headers (no abbreviation, no reordering):
```
| # | Category | Tool Layer | Pattern | Root Cause | Rule / Gap | Repeat? | Proposed Actions (1~2) | Rationale | Priority |
```
Column semantics:
- Category: comma-separated subset of behavioral, tool, workflow, spec-gap (≥1, see Stage 2 pre-scan categorization)
- Tool Layer: one of mcp, cli, builtin, skill, or — (mandatory non-— when tool ∈ Category, optional skill for workflow / spec-gap, — for behavioral)
- Proposed Actions (1~2): comma-separated subset of memory, issue, claude_md_draft, skill_idea, hook_code, upstream_feedback
- Rationale: free-form one-line for compound or non-memory rows; for memory-only rows (single memory, not compound), the cell MUST contain exactly 5 lines matching ^not (issue|claude_md_draft|skill_idea|hook_code|upstream_feedback): .+$, one line per non-memory action type. Generic single-sentence rationales are NOT acceptable for memory-only findings. For rows whose actions include upstream_feedback (single or compound), the cell MUST also contain a literal line backing_repo: <owner>/<repo> (embedded via   for single-line markdown form) — Stage 4 Action 4 step 0 reads this as the routing decision. Compound case memory, upstream_feedback: the row is NOT memory-only (contains a non-memory action), so the 5-line schema does NOT apply — instead use free-form prose for the human rationale + the backing_repo: line. Compound combinations are additive: each action-specific Rationale convention applies independently to the row, joined with  .

The Stop hook parses the distribution-card fence (deterministic) and the table (anchored on these literal column headers). Drift in this contract requires synchronized edits to hooks/retrospect-mix-check.sh, tests/test_retrospect_mix_check.sh, and tests/fixtures/retrospect-synth-*.expected.json.

Spec AC-A3 deviation note — earlier draft asked for "memory-only justification 한 줄" inside the distribution card. v2 relocates that justification into the unified-table Rationale column as the structured 5-line not <action>: <reason> block: strictly more informative than a single line, single source of truth, eliminates card↔table inconsistency risk.

Present findings as a single unified table per the Output Schema Contract above:

## Retrospect Report — {session_date}

<!-- retrospect:distribution begin -->
- memory: {n}
- issue: {n}
- claude_md_draft: {n}
- skill_idea: {n}
- hook_code: {n}
- upstream_feedback: {n}
- gate_1_verdict: {PASS|FAIL|NA}
- gate_2_verdict: {PASS|FAIL|NA}
<!-- retrospect:distribution end -->

| # | Category | Tool Layer | Pattern | Root Cause | Rule / Gap | Repeat? | Proposed Actions (1~2) | Rationale | Priority |
|---|----------|------------|---------|------------|------------|---------|------------------------|-----------|----------|
| 1 | {behavioral|tool|workflow|spec-gap, ...} | {mcp|cli|builtin|skill|—} | {pattern} | {root_cause} | {rule_ref or "gap"} | {Yes(Nx)/No} | {action1[, action2]} | {rationale: 5 `not <action>:` lines for memory-only, or one-line for compound/non-memory} | HIGH/MED/LOW |
...

No patterns found: emit the distribution card with all counts = 0 and verdicts = NA, plus literal "This session followed all CLAUDE.md rules. ✅"

The unified table folds the previous dual-table layout (Pattern + Tool/Feature Findings) into one. Tool-layer information that previously lived in a separate "Tool/Feature Findings" table is now carried in the Tool Layer column of every row tagged with tool in Category. Reviewers see all findings in priority order without cross-referencing two tables.

Sorting: rows SHOULD be sorted by Priority (HIGH → MED → LOW). Within the same priority, prefer non-memory Proposed Actions first so escalations surface above behavioral memos.

Action type baseline comes from Stage 2 escalation ladder, but Stage 3 MUST explicitly evaluate all six action types per finding and select 1–2 composite actions.

Exception — one-off mistakes: If Stage 2 classified the finding as note only (situational root cause, unlikely to recur), skip the evaluation below entirely. No persistent action is created; the finding appears in the report as acknowledged only.

For each finding (except one-off), evaluate ALL six action types before selecting:

Action Type	When to Choose	Skip If
MEMORY.md feedback	New pattern (1st occurrence, repeat_count=0), individual learning	repeat=true (memory is BLOCKED)
GitHub issue	Systemic fix needed (tool/skill implementation), repeat pattern (1–2×)	One-off mistake, purely local insight
CLAUDE.md draft	Explicit rule gap exists, cross-project scope needed	Existing rule already covers this pattern
Skill idea note	Repeat pattern needs enforcement mechanism, manual recall is insufficient	Single memo is sufficient, no recurring trigger
Hook code	Repeat (3x+) requiring automated enforcement; manual recall has repeatedly failed	Fewer than 3 repeats; skill idea or rule is sufficient
Upstream feedback	Tool/feature-level defect identified in step 4b; improvement needed in the tool itself, not in Claude's behavior	Finding is purely a rule violation with no tool-level root cause

Selection matrix — three axes to determine compound vs. single action:

Axis	Signal → Action
Repeat count	0× → `memory` (first occurrence); 1–2× → `issue` (memory blocked — repeat=true); 3×+ → `skill` or `hook` (enforcement gap)
Scope	Cross-project impact → `CLAUDE.md draft`; single-project → `MEMORY.md`
Gap type	Rule violated → `memory` (reinforce); rule absent → `CLAUDE.md draft` (fill gap); no enforcement → `skill idea`

Axis precedence: Repeat-count is the highest-priority axis. When repeat=true, the Scope and Gap type axes cannot override to memory — the repeat-count constraint (issue / skill / hook) always wins. Apply Scope and Gap type only to determine additional actions alongside the repeat-count result.

Compound action is the default for HIGH-priority findings. A single memory action is acceptable only when the rationale for skipping all other types is explicitly stated in the Rationale column.

Before approval, explain each action's concrete plan:

For each finding, present:

What will be created (file path, issue title, hook name, or CLAUDE.md rule text)
Why this action type (escalation rationale — e.g., "Already recorded 3x in MEMORY.md")
How it will be verified (what check confirms it works)

Example (single action — repeat pattern):

Finding #2: Workflow step skipped (4th occurrence)

Proposed Actions: GitHub issue

Rationale: Already recorded 3x in MEMORY.md. Memory alone has failed. Structural fix required.

What will be created: issue — feat(hook): add external-repo commit guard

Verify: issue URL returned + gh issue view confirms existence

Example (compound action — rule gap + repeat):

Finding #1 (HIGH): Hasty interpretation without verification (ambiguous signal → worst-case conclusion, 3 occurrences)

Proposed Actions: CLAUDE.md draft + GitHub issue

Rationale: Rule absent + 3× repeat → fill the rule gap (CLAUDE.md draft) and track enforcement compliance (GitHub issue); matches Stage 2 ladder: "Missing rule + Repeat"

What will be created:

CLAUDE.md draft: new rule requiring a disconfirmation check before concluding from ambiguous signals

issue — feat(retrospect): enforce falsify-first check on ambiguous signal interpretation

Verify: CLAUDE.md draft shown to user for approval + issue URL returned

Then ask for approval per item using AskUserQuestion:

For each finding, user selects:
  ✅ Execute now  |  ⏭ Skip  |  🕐 Defer (create note only)

Do NOT execute any action until user approves.

Stage 4: Execute

"note only" items require no execution — they appear in the completion report as acknowledged but need no persistent artifact.

For each approved action:

MEMORY.md feedback → Write to $CLAUDE_CONFIG_DIR/projects/.../memory/ with proper frontmatter
- Type: feedback
- Include: rule, why, how to apply
- Update MEMORY.md index
⚠️ MANDATORY: Duplicate check before creating any memory file:

Precondition: This check applies ONLY when the finding's action type is memory (new pattern). If Stage 2 already marked repeat=true and escalated to issue/hook/CLAUDE.md, skip this check — the escalation ladder takes precedence over merge.

a. Reuse Stage 2 Step 7's repeat scan results — if a finding matched an existing memory but was NOT escalated (i.e., it's a genuinely new sub-pattern), that file is the merge target b. If no Stage 2 match: scan MEMORY.md index for entries with overlapping root cause or topic (concept-level, not keyword) c. For each candidate, read the existing memory file and compare:
- Same root cause / principle → merge: append new context (examples, How to apply items) to the existing file. If merge makes this the 2nd+ occurrence, re-evaluate whether action type should escalate per Stage 2 Step 8
- Related but distinct principle → create new file (genuinely different insight) d. Never create a new file when the insight is a specific instance of an existing general rule — add it as a numbered sub-item instead e. After merge or create, update MEMORY.md index (update description if merged, add new line if created)
GitHub issue → Use project's issue creation skill or gh issue create
- Title: Conventional Commits format (per project convention)
- Body: per project convention, with background + task list
CLAUDE.md draft → Write proposed rule addition as a markdown block
- ⚠️ $CLAUDE_CONFIG_DIR/CLAUDE.md is global scope — changes affect every project
- Present the draft to user for review BEFORE any edit
- Apply only with explicit approval ("yes, add this rule")

Upstream feedback → Resolve the tool's backing repo first (do NOT hardcode any specific repo), then create a labeled issue there. Hardcoding misroutes plugin defects, custom MCP defects, dotfiles defects across user environments.

Step 0 — Backing-repo verification gate (MUST run before any mutation)

This gate is the first procedure step for every upstream_feedback row, executed before any of the resolution-table lookups below. Skipping it means the most salient file path in the executor's local context (often the working project repo) wins the routing decision — which is the exact failure mode this gate prevents.

Read the declaration. Parse backing_repo: <owner/repo> from the finding's Rationale cell (Stage 2 step 8 makes this MANDATORY for upstream_feedback rows; Stage 3 surfaces it). If the declaration is absent → ABORT this action and return the finding to Stage 2 step 8 with prompt: "Finding #N upstream_feedback row missing backing_repo declaration — re-run Stage 2 step 8."
Re-resolve from source-of-truth. Independently of the declaration, re-resolve the backing repo using the resolution table below. Do NOT use the declared value as the lookup input — use the tool/layer signal from Stage 2 step 4b to derive the repo from scratch. Capture the re-resolved value as live_backing_repo. If the resolution table's Other / ambiguous row matches the layer (no concrete repo derivable), treat live_backing_repo = AMBIGUOUS and skip to step 0.4 with a 2-way prompt instead of 3-way.
Compare. If live_backing_repo == declared backing_repo → proceed to the rest of Action 4. Normalization rules for equality (apply both sides):
- Strip leading/trailing whitespace
- Strip trailing .git
- Treat all of these as equivalent forms of the same repo: owner/repo, https://github.com/owner/repo, git@github.com:owner/repo, ssh://git@github.com/owner/repo
- Case-insensitive on owner and repo (GitHub treats them case-insensitively for routing)
Divergence / ambiguity handling. If live_backing_repo != declared backing_repo (after normalization) → ABORT and surface to user via AskUserQuestion. Two prompt variants:

(i) Both sides concrete repos — 3-way prompt:
```
⚠ Backing-repo divergence on Finding #N:
   Stage 2/3 declared:    {declared}
   Stage 4 re-resolved:   {live}

어느 쪽이 정확합니까?
[a] declared ({declared}) 으로 진행
[b] re-resolved ({live}) 으로 진행 (Stage 2 declaration 정정)
[c] 이 finding 은 skip — upstream_feedback 액션 제거
```
(ii) Re-resolution returned AMBIGUOUS (declared is concrete) — 2-way prompt:
```
⚠ Backing-repo re-resolution ambiguous on Finding #N:
   Stage 2/3 declared:    {declared}
   Stage 4 re-resolved:   AMBIGUOUS (resolution table's `Other / ambiguous` row)

어느 쪽으로 진행할까요?
[a] declared ({declared}) 으로 진행 (사용자가 Stage 2에서 결정한 값을 신뢰)
[b] 이 finding 은 skip — upstream_feedback 액션 제거
```
(iii) Declared was AMBIGUOUS but re-resolution found a concrete value — 2-way prompt: mirror of (ii) with [a] = use re-resolved, [b] = skip.

Do NOT proceed without an explicit pick. [b] (in variant i) requires updating the declared backing_repo line — record the corrected value in the Actions Executed report's verification trail rather than re-emitting the entire Stage 3 report (the report is append-only post-Stage-3; corrections live in step 0.5's trail). The skip path removes upstream_feedback from the row's action set and logs the divergence reason in the Actions Executed section.
Verification trail. Record both values + the chosen path in the Actions Executed report (e.g., Finding #N: backing_repo verified (declared=live=devseunggwan/praxis) or Finding #N: divergence resolved via [b] — switched declared <X> → re-resolved <Y>). This trail is the defense against silent misrouting in retrospective analysis.

Backing repo resolution (used by step 0.2 and as reference)

Tool name / layer pattern	Backing repo resolution
`mcp__<plugin>__*` from a Claude Code plugin	Read `repository` field from that plugin's `.claude-plugin/plugin.json` (or equivalent manifest)
`mcp__<service>-*` from a custom/team MCP server	The MCP server's source repo — `git remote -v` of the server's directory, or read its package manifest
Skill within the praxis distribution itself	The praxis source repo this skill was installed from — read `repository` field in praxis's own plugin manifest
Hook in `~/.claude/hooks/` or a globally symlinked CLAUDE.md/AGENTS.md	The user's dotfiles backing repo — resolve via `ls -la` symlink chain, then `git remote -v` of the target dir
CLI tool (e.g., `gh`, `kubectl`)	The CLI's open-source upstream if accessible; otherwise `note only`
Builtin tool (Read/Edit/Bash/Grep)	Typically not actionable — `note only`
Other / ambiguous	Ask the user; do NOT fall back to a hardcoded repo

If the active project's CLAUDE.md provides a feature-to-repo mapping, consult it before deciding a repo.

Then create the issue (using the verified backing_repo from step 0):

Title: {type}({tool_layer}): {friction description} (Conventional Commits format)
Label: tool-friction:{layer} is praxis's own convention. Apply it ONLY when the verified backing repo is the praxis distribution itself. For any other backing repo, use that repo's existing label conventions (e.g., bug, enhancement); do NOT auto-create praxis-style labels in unrelated repos.
If tool-friction:* is needed and missing in the praxis repo: gh label create "tool-friction:{layer}" --repo <verified-praxis-repo>
Body: include evidence, expected behavior, proposed fix direction from step 4b finding
Command: gh issue create --repo <verified_backing_repo> --title "$TITLE" --label "$LABEL" --body "$BODY" — substitute the verified repo, never hardcode
Verification (mandatory): issue URL is returned, gh issue view {url} succeeds, AND the URL's repo matches the verified backing repo (catches misrouting)

Skill idea note → Write to {current_project}/.omc/plans/retrospect-skill-idea-{slug}.md
- {current_project} = $CLAUDE_PROJECT_DIR or git rev-parse --show-toplevel
- Include: problem, proposed skill trigger, pipeline sketch
Hook code → For enforcement-level actions (repeat 3x+): a. Write hook script to .claude/hooks/ or appropriate location b. Present the hook code to user for review c. Explain how to register in .claude/settings.json (show the exact JSON entry) d. Use AskUserQuestion: "Hook을 settings.json에 등록할까요?" (✅ 등록 / ⏭ 파일만 유지 / 🕐 나중에) e. If approved: Edit .claude/settings.json to register the hook f. If skipped/deferred: leave the hook file in place and provide manual registration instructions

Verification — For each executed action, verify the artifact:

Artifact	Verification
MEMORY.md feedback (new)	File exists + MEMORY.md index updated
MEMORY.md feedback (merged)	Existing file updated (diff shown) + MEMORY.md index description updated if needed
GitHub issue	`gh issue view {url}` returns valid data
Upstream feedback	`gh issue view {url}` returns valid data + URL repo matches `verified_backing_repo` from step 0 + label convention is correct for the verified repo (`tool-friction:{layer}` ONLY when verified repo is the praxis distribution; otherwise the repo's own convention label per Action 4's label rule)
Hook code	Script file exists + settings.json registration confirmed (dry-run varies by hook type — no generic check)
CLAUDE.md draft	Diff shown to user + explicit approval received
Skill idea note	File exists in `.omc/plans/`

Report verification results in the completion table.

Completion report:

## Actions Executed

| # | Action | Result |
|---|--------|--------|
| 1 | MEMORY.md feedback added | ✅ {file_path} |
| 2 | GitHub issue created | ✅ {url} |
| 3 | Upstream feedback (Finding #N) | ✅ {url} (backing_repo verified: declared=live={owner/repo}) |
| 4 | Upstream feedback (Finding #M) | ⚠ aborted at step 0 — declared {X} ≠ re-resolved {Y}; user picked [b], re-issued at {url} |
| 5 | Upstream feedback (Finding #P) | ⊘ skipped at step 0 — divergence; user picked [c], action removed; reason: declared {X} not reachable, re-resolved {Y} unfamiliar to user |
...

Session learnings captured. Next session will benefit from these improvements.

Rationalization Prevention

Excuse	Reality
"It was a one-off mistake, not worth capturing"	If it happened once, it can happen again. Capture it.
"I know the root cause, I'll just note the symptom"	Symptoms recur. Root causes get fixed. Write the root cause.
"MEMORY.md is already long, skip this"	Length doesn't matter. Missing the pattern is the cost.
"The session was mostly fine, nothing to retrospect"	Even 1 friction event is worth 2 minutes to capture.
"I'll do this later"	Later never comes. Do it at session end while context is fresh.
"This is a tool issue, not a Claude issue"	Tool + Claude interaction is within scope. Both can be improved.
"Tool issue라서 이번 retrospect scope 밖이다"	Scope 안이다. Step 4b에서 분석하고 `upstream feedback`으로 도구의 backing repo (Stage 4 Action 4 의 routing 표 참고)에 이슈를 남겨야 한다.
"도구 결함이지 내 행동 문제가 아니다"	둘 다일 수 있다. Step 4 (행동 교정)와 step 4b (도구 개선)에 각각 기록하라. 하나만 선택하지 마라.

Red Flags — STOP

If you catch yourself:

Proposing actions before completing Stage 2 analysis
Writing "root cause: Claude forgot to X" without tracing WHY the forgetting happened
Adding a MEMORY.md entry that just repeats the CLAUDE.md rule verbatim (no new insight)
Creating a GitHub issue for every minor friction (low-ROI noise)
Skipping the approval step and executing actions directly
Editing $CLAUDE_CONFIG_DIR/CLAUDE.md without presenting the draft first — this is global config, affects every project
Proposing memory for a pattern that already exists in MEMORY.md (MUST escalate instead)
Skipping tracer/analyst agent calls ("I can analyze this myself")
Generating artifacts without verification ("issue created" without showing URL)
Creating a new memory file without checking existing entries for overlap (MUST merge into existing when root cause matches)
Proposing MEMORY.md feedback as the only action when the same rule was violated 3+ times — this ignores memo's proven limits; enforcement mechanisms (skill, hook, rule) MUST be evaluated alongside memory
Proposing MEMORY.md feedback as the only action when the finding is a rule gap (rule absent) — gaps are not filled by memos; CLAUDE.md draft or skill idea MUST be considered
Forcing tool friction into only a rule-violation frame — tool-layer defects from step 4b MUST be carried in the unified findings table with Tool Layer set to a non-— value and evaluated for upstream feedback, not collapsed into rule-violation-only findings
Skipping step 4b entirely ("no tool issues this session") — step 4b is mandatory. If no tool friction is found, the distribution card MUST emit upstream_feedback: 0 and the report MUST state "No tool/feature friction detected. ✅" explicitly
Pre-scan에서 friction event에 category[] 라벨링을 누락한 채 Stage 2 step 3 이상 진행 — Layer E 강제. 누락은 Stage 2 진입 전 차단되어야 한다.
Memory-only finding의 Rationale이 5줄 not <action>: <reason> 형식이 아니거나 5 action type 미만 커버 — Gate-2 위반. 일반 한 줄 진술은 memory-only 근거로 부적격.
Stage 2.5 분포 감사를 명시적으로 건너뛰고 Stage 3로 직행 — distribution card와 Gate-1/Gate-2 verdict 출력은 Stage 3 입력의 mandatory 전제.
tool 라벨 finding의 Tool Layer 컬럼이 —로 비어 있음 — Layer E ↔ step 4b composition matrix 위반. tool 카테고리는 4b layer 중 하나(mcp/cli/builtin/skill)를 반드시 가져야 한다.
upstream_feedback 행에 backing_repo: <owner/repo> 선언이 없음 — Stage 2 step 8 위반. 선언은 Stage 4 Action 4 step 0의 라우팅 결정 입력이며, 누락 시 Stage 4가 abort 한다.
Stage 4 Action 4에서 step 0 (declared vs re-resolved 비교)을 건너뛰고 바로 gh issue create 실행 — 이슈가 잘못된 레포로 라우팅되는 정확한 실패 경로. 선언과 재계산 값을 모두 기록하지 않은 채 진행하면 retrospect 자체가 검증 불가.

ALL of these mean: STOP. Return to Stage 2.

Quick Reference

Stage	Key Activity	Success Criteria
1. Load	Read CLAUDE.md, form scan questions	Rule categories identified
2. Analyze	Scan conversation, map to rules, find root cause	Root cause (not symptom) for each pattern; every event has `category[]`
2.5 Audit	Run Gate-1 (categorical) + Gate-2 (5-line rationale schema)	Both gates PASS or per-finding cap reached and surfaced to user
3. Report	Present unified table + distribution card, collect approval per item	User approved at least 1 item (or confirmed 0 findings)
4. Execute	Run approved actions, verify artifacts	Completion report with links/paths + verification results

Error Handling

Stage	Failure	Action
Stage 1 (load)	CLAUDE.md not found (project or global)	Proceed with global defaults; flag the missing file in the report
Stage 2 (analyze)	Session history not accessible	Fall back to the user's verbal summary as input to steps 3–8
Stage 2 (analyze)	No friction events found	Exit with "No patterns found. ✅" — do not fabricate findings
Stage 2 (analyze)	MEMORY.md scan failed (file not accessible)	Treat all findings as new patterns (repeat=false). Flag scan failure in report
Stage 2 (analyze)	MEMORY.md is empty	Normal processing — all findings are new patterns
Stage 2 (analyze)	tracer/analyst call failed	Fall back to manual analysis. Flag agent failure in report. Warn about reduced root cause quality
Stage 2 (analyze)	Pre-scan event missing `category[]` label	Block Stage 2 progression to step 3; instruct LLM to backfill labels per Layer E enumerated values
Stage 2.5 (audit)	Gate-1 violation persists after 2 per-finding re-entries	Surface to user with override prompt; log user-supplied rationale
Stage 2.5 (audit)	Gate-2 violation persists after 2 per-finding re-entries	Surface to user with override prompt; log user-supplied rationale
Stage 2.5 (audit)	Behavioral-only safeguard triggered (tool keywords detected in pre-scan signals)	Surface to user; require explicit confirmation before proceeding to Stage 3
Stage 3 (report)	User rejects all findings	Capture the rejection itself as a feedback signal for future retrospects
Stage 4 (execute)	MEMORY.md write fails	Report the path error; never silently drop the feedback
Stage 4 (execute)	GitHub issue creation fails	Fall back to saving a note in `.omc/plans/` for later manual creation
Stage 4 (execute)	Upstream feedback issue creation fails	Fall back to saving a note in `.omc/plans/tool-friction-{slug}.md` with intended `tool-friction:{layer}` label and issue draft
Stage 4 (execute)	`tool-friction:*` label doesn't exist (and the verified backing repo is the praxis distribution)	Auto-create with `gh label create "tool-friction:{layer}" --repo <verified-praxis-repo>` and retry
Stage 4 (execute)	Action 4 step 0 — `backing_repo` declaration missing from finding row	ABORT this action; return finding to Stage 2 step 8 with prompt to emit declaration; do NOT fall back to project repo
Stage 4 (execute)	Action 4 step 0 — declared vs re-resolved `backing_repo` divergence	ABORT this action; surface `AskUserQuestion` per step 0.4 prompt variants (3-way `[a] declared / [b] re-resolved / [c] skip-upstream_feedback-action`, or 2-way for AMBIGUOUS cases); do NOT auto-pick

Integration

Entry point: End of a working session, or after a particularly rough workflow experience Exit point: Completion report shown — improvements applied to the next working session

OMC delegation:

tracer agent: causal chain analysis for complex friction patterns
analyst agent: cluster multiple friction events into root causes
Project's issue creation skill: GitHub issue creation in Stage 4

name	retrospect
description	Session retrospect — analyze current Claude Code session against CLAUDE.md rules, identify friction patterns and root causes, propose context-appropriate improvement actions, then execute after user approval. Triggers on "retrospect", "what went wrong", "session review", "session improvement", "what was the issue", "improve".