Run any Skill in Manus with one click

$pwd:

refine

Name: Refine
Author: duc01226

// [Project Management] Use when converting ideas to PBIs, validating problem hypotheses, adding acceptance criteria, or refining requirements.

Run Skill in Manus

$ git log --oneline --stat

stars:6

forks:1

updated:May 6, 2026 at 08:19

SKILL.md

readonly

package.json

"author": "duc01226"

"repository": "duc01226/EasyPlatform"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	refine
description	[Project Management] Use when converting ideas to PBIs, validating problem hypotheses, adding acceptance criteria, or refining requirements.

Codex compatibility note:

Invoke repository skills with $skill-name in Codex; this mirrored copy rewrites legacy Claude /skill-name references.

Prefer the plan-hard skill for planning guidance in this Codex mirror.

Task tracker mandate: BEFORE executing any workflow or skill step, create/update task tracking for all steps and keep it synchronized as progress changes.

User-question prompts mean to ask the user directly in Codex.

Ignore Claude-specific mode-switch instructions when they appear.

Strict execution contract: when a user explicitly invokes a skill, execute that skill protocol as written.

Subagent authorization: when a skill is user-invoked or AI-detected and its protocol requires subagents, that skill activation authorizes use of the required spawn_agent subagent(s) for that task.

Do not skip, reorder, or merge protocol steps unless the user explicitly approves the deviation first.

For workflow skills, execute each listed child-skill step explicitly and report step-by-step evidence.

If a required step/tool cannot run in this environment, stop and ask the user before adapting.

Codex Project-Reference Loading (No Hooks)

Codex does not receive Claude hook-based doc injection. When coding, planning, debugging, testing, or reviewing, open project docs explicitly using this routing.

Always read:

docs/project-config.json (project-specific paths, commands, modules, and workflow/test settings)
docs/project-reference/docs-index-reference.md (routes to the full docs/project-reference/* catalog)
docs/project-reference/lessons.md (always-on guardrails and anti-patterns)

Situation-based docs:

Backend/CQRS/API/domain/entity changes: backend-patterns-reference.md, domain-entities-reference.md, project-structure-reference.md
Frontend/UI/styling/design-system: frontend-patterns-reference.md, scss-styling-guide.md, design-system/README.md
Spec/test-case planning or TC mapping: feature-docs-reference.md
Integration test implementation/review: integration-test-reference.md
E2E test implementation/review: e2e-test-reference.md
Code review/audit work: code-review-rules.md plus domain docs above based on changed files

Do not read all docs blindly. Start from docs-index-reference.md, then open only relevant files for the task.

[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set in_progress when step starts, set completed when step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.

Quick Summary

Goal: Transform raw ideas into actionable PBIs using BA best practices, hypothesis validation, domain research.

Workflow:

Phase	Name	Key Activity	Output
1	Idea Intake	Load artifact, detect module	Context loaded
2	Domain Research	WebSearch market/competitors	Research summary
3	Problem Hypothesis	Validate problem exists	Confirmed hypothesis
4	Elicitation	Apply BABOK techniques	Requirements extracted
5	Acceptance Criteria	Write BDD scenarios	GIVEN/WHEN/THEN
6	Prioritization	Apply RICE/MoSCoW + Story Points	Priority + estimate
7	Validation	Interview user (MANDATORY)	Assumptions confirmed
8	PBI Generation	Create artifact	PBI file saved

Key Rules:

NEVER skip hypothesis validation for new features
Validation interview NOT optional — always ask 3-5 questions
Use project domain-specific vocabulary when available
MUST ATTENTION include story_points, complexity, man_days_traditional, man_days_ai in PBI frontmatter
Every PBI MUST ATTENTION include Dependencies table — types: must-before | can-parallel | blocked-by | independent
docs/specs/ — read existing TCs for related features; recommend test spec generation for new PBIs
docs/project-reference/domain-entities-reference.md — read when task involves business entities/models

Frontend/UI Context (if applicable)

Component patterns: docs/project-reference/frontend-patterns-reference.md (read directly when relevant; do not rely on hook-injected conversation text)
Styling/BEM guide: docs/project-reference/scss-styling-guide.md
Design system tokens: docs/project-reference/design-system/README.md

Greenfield Mode

Auto-detected: No code directories (src/, app/, lib/, server/, packages/, etc.) and no manifest files (package.json/*.sln/go.mod) found. Planning artifacts (docs/, plans/, .claude/) don't count.

When greenfield detected:

Skip existing backlog refinement (no backlog exists)
Enable DDD domain modeling: bounded contexts, aggregates, entities, value objects
Capture constraints: team skills, expected scale, hosting preferences, budget — as input signals only
Use WebSearch for market research + competitor analysis
Output domain model artifact alongside PBI artifact
Increase ask the user directly frequency — validate domain boundaries, entity relationships, business rules
[CRITICAL] NEVER ask about tech stack during refinement. Tech stack decided after business analysis. Capture team skills + scale expectations as signals only.

Be skeptical. Every claim needs traced proof, confidence >80%.

Phase 0: Locate Active Plan (if in workflow)

If running in workflow (big-feature, greenfield-init, etc.):

Glob plans/*$plan-hard.md sorted by modification time, or check the current task list for plan context
Read plan.md — project scope, goals, architecture decisions, domain model
Read existing research — {plan-dir}$research/*.md for business evaluation, domain analysis
Read docs/project-reference/domain-entities-reference.md (if exists) — existing domain entities
Use plan context — don't re-ask questions already answered in prior steps

Phase 1: Idea Intake & Context Loading

Read idea artifact from path or find by ID in team-artifacts/ideas/
Extract: problem statement, value proposition, target users, scope
Check module field; if absent, detect via keywords or prompt user

Phase 2: Domain Research

Trigger: New domain, unclear competitors, --research flag. Skip: Internal tooling, well-understood domain, time-constrained.

Use WebSearch with domain terms. Summarize in max 3 bullets (market context, competitors, best practices).

Phase 3: Problem Hypothesis Validation

Validate hypothesis with user via ask the user directly. 42% of startups fail due to no market need — validate before building.

Skip: --skip-hypothesis, validated hypothesis exists, bug fix/tech debt.

Problem Hypothesis Template

**We believe** [target users/persona]
**Experience** [specific problem]
**Because** [root cause]
**We'll know this is true when** [validation metric/evidence]

Value Hypothesis Template

**We believe** [feature/solution]
**Will deliver** [value/benefit]
**To** [target users]
**We'll know we're right when** [success metric]

Validation Process

Draft hypothesis from idea content
Use ask the user directly to validate:
- "Is this the core problem we're solving?"
- "Who exactly experiences this? How often?"
- "What evidence do we have this problem exists?"
Validated → proceed to elicitation
Invalidated → return idea for clarification

Phase 4: Requirements Elicitation (BABOK Core 5)

Think: What information gaps exist? Which technique fills them with least effort + highest confidence?

Technique	When to Choose	What to Extract
Interviews	Deep insights needed, stakeholder perspectives vary	Stakeholder needs, pain points, constraints
Workshops	Group consensus needed, multiple stakeholders	Prioritized requirements, consensus decisions
Document Analysis	Existing systems/processes, regulatory requirements	As-is state, compliance requirements, gaps
Observation	Users can't articulate needs, workflow unclear	Actual vs stated workflow, hidden requirements
Prototyping	Visual validation needed, UI/UX requirements vague	Validated UI requirements, interaction patterns

Technique notes:

Interviews: Open-ended questions (why, how, what-if) → active listening → follow-up on unexpected → document verbatim quotes
Workshops: Define agenda + 90 min timebox → neutral facilitator → round-robin/silent voting → document decisions AND dissent
Observation: Shadow users → note workarounds/pain points → don't interrupt → ask clarifying questions afterward

Phase 5: Acceptance Criteria (BDD Format)

Write GIVEN/WHEN/THEN scenarios. Minimum 3: happy path, edge case, error case.

Scenario: {Descriptive title}
  Given {precondition/context}
    And {additional context}
  When {action/trigger}
    And {additional action}
  Then {expected outcome}
    And {additional verification}

Practice	Rule
Single trigger	"When" clause has ONE action
3 scenarios minimum	Happy path, edge case, error case
No implementation details	Behavior, not how
Testable outcomes	"Then" must be verifiable
Stakeholder language	No technical jargon

Example Scenarios

Scenario: Employee creates goal with valid data
  Given employee has permission to create goals
    And employee is on the goal creation page
  When employee submits goal form with all required fields
  Then goal is created with status "Draft"
    And goal appears in employee's goal list

Scenario: Goal creation fails with missing required field
  Given employee is on the goal creation page
  When employee submits form without title
  Then validation error "Title is required" is displayed
    And goal is not created

Scenario: Manager reviews subordinate goal
  Given manager has direct reports
    And subordinate has submitted goal for review
  When manager opens goal review page
  Then subordinate's goal is visible with "Pending Review" status

Project Test Case Format

Format: TC-{FEATURE}-{NNN} (e.g., TC-GM-001)
Evidence: file:line format
See business-analyst skill for detailed patterns

Phase 5.5: Testability Assessment

Use a direct user question with 2-3 questions:

"Which testing approach fits this PBI?"
- TDD-first: Write test specs before implementation (Recommended for complex features)
- Implement-first: Build feature, then create test specs
- Parallel: Spec and implement simultaneously
"What test levels are needed?"
- Integration tests only (Recommended for backend CQRS)
- Integration + E2E
- Unit + Integration + E2E

For EACH acceptance criterion, generate corresponding test case outline:

AC	Test Outline	Priority
AC-1	TC: Create goal with valid data → verify persisted	P0
AC-2	TC: Create goal without title → verify validation error	P1

Seed for $tdd-spec if user chooses TDD-first. Document in PBI under ## Testability Assessment.

Phase 6: Prioritization & Estimation

Apply RICE score or MoSCoW for priority. Estimate using Story Points (Modified Fibonacci 1-21).

Quick RICE Score

Score = (Reach x Impact x Confidence) / Effort

Reach: Users affected per quarter (100, 500, 1000+)
Impact: 0.25 (minimal) | 0.5 (low) | 1 (medium) | 2 (high) | 3 (massive)
Confidence: 0.5 (low) | 0.8 (medium) | 1.0 (high)
Effort: Story points (1, 2, 3, 5, 8, 13, 21)

MoSCoW Categories

Category	Meaning	Action
Must Have	Critical, non-negotiable	Include in MVP
Should Have	Important but not vital	Plan for release
Could Have	Nice to have, low effort	If time permits
Won't Have	Out of scope this cycle	Document for future

Phase 7: Validation Interview (MANDATORY)

Generate 3-5 questions covering assumptions, scope, dependencies, edge cases. Use ask the user directly. Document in PBI. NOT optional.

Category	Example Question
Assumptions	"We assume X is true. Correct?"
Scope	"Should Y be included or explicitly excluded?"
Dependencies	"This requires Z. Is that available?"
Edge Cases	"What happens when data is empty/null?"
Business Impact	"Will this affect existing reports/workflows?"
Entities	"Create new entity or extend existing X?"
Prod Readiness	"Does this feature need linting, error handling, loading, or Docker setup?"
Authorization	"Who can perform this action? What roles/permissions are needed?"
Seed Data	"Does this feature need reference/lookup data to function?"
Data Migration	"Does this change entity schema? Is data transformation needed?"

Generate 3-5 questions from assumptions, scope, dependencies
Use a direct user question to interview
Document in PBI under ## Validation Summary
Update PBI based on answers

Validation Output Format

## Validation Summary

**Validated:** {date}

### Confirmed Decisions

- {decision}: {user choice}

### Assumptions Confirmed

- {assumption}: Confirmed/Modified

### Open Items

- [ ] {follow-up items}

Cross-Cutting & Production Readiness

Capture in PBI template sections: Production Readiness Concerns, Authorization & Access Control, Seed Data, Data Migration.

Phase 7.5: Re-evaluate Estimation (MANDATORY — runs after Validation Interview)

Why this phase exists: Phase 6 estimation runs against a draft scope. Phase 7 (Validation Interview) and Cross-Cutting capture often resolve unknowns, add constraints, or trim/expand scope. The numbers in story_points, complexity, man_days_traditional, man_days_ai MUST be re-derived against the locked scope BEFORE Phase 8 writes them into the PBI frontmatter. Estimating once at draft and forgetting is the #1 source of estimation drift in PBIs.

Inputs (locked by end of Phase 7)

Confirmed assumptions, scope inclusions/exclusions
Authorization, seed data, migration, prod-readiness decisions
Newly discovered dependencies or edge cases
Any rescoping the user requested during validation

Re-derive (per `SYNC:estimation-framework`)

Walk the locked scope acceptance criteria + cross-cutting concerns; assign hours per slice.
bottom_up_hours = Σ slice_hours (use the SP table mapping in Phase 6, not eyeballing).
likely_days = ceil(bottom_up_hours / 6) × productivity factor for the team/AI mode.
Recompute risk_margin_pct based on remaining unknowns AFTER Phase 7 (margin should usually shrink because validation removed unknowns; rises only if new risks surfaced).
Recompute min-max range from the new likely_days ± margin.
Re-pick the closest Fibonacci story_points and complexity bucket from the re-derived likely_days.

Compare against Phase 6 draft estimate

Compute delta_pct = (new_likely_days - draft_likely_days) / draft_likely_days × 100.

Delta	Action
`\|delta\| ≤ 20%`	Keep draft estimate. Note `reestimate_delta_pct: <signed>` + `reestimate_reason: "within tolerance, no change"` in PBI frontmatter for transparency.
`\|delta\| > 20%`	UPDATE `story_points`, `complexity`, `man_days_traditional`, `man_days_ai`. Add `reestimate_delta_pct: <signed>` + 1-line `reestimate_reason` explaining what changed (e.g., "auth scope confirmed wider", "seed data dropped per validation").
`\|delta\| > 50%`	UPDATE values AND flag `SHOULD-RESCOPE`. Surface to user via a direct user question BEFORE Phase 8 writes the PBI: "Re-estimate is +/-X% vs original. Options: (a) accept new estimate as-is, (b) split into 2 PBIs, (c) trim scope back to original estimate, (d) defer." Record the user's decision in `## Validation Summary`.

Output

Updated estimation values (carry into Phase 8 frontmatter)
New frontmatter fields: reestimate_delta_pct, reestimate_reason (always populate even when within tolerance — creates a paper trail for retrospective comparison against actual implementation time)
If rescoped: updated acceptance criteria/scope sections reflecting the user's choice

DO NOT skip this phase. A PBI that ships with stale Phase 6 estimates is the source of unreliable velocity data. The whole point is to make the post-validation numbers — not the pre-validation guesses — the ones the team commits to.

Phase 8: PBI Artifact Generation

Path: team-artifacts/pbis/{YYMMDD}-pbi-{slug}.md | ID Pattern: PBI-{YYMMDD}-{NNN}

PBI Template

---
id: PBI-{YYMMDD}-{NNN}
title: '{Brief descriptive title}'
module: '{ModuleName — detect from project-config.json modules[]}'
priority: Must Have | Should Have | Could Have | Won't Have
story_points: 1 | 2 | 3 | 5 | 8 | 13 | 21
complexity: Low | Medium | High | Very High
man_days_traditional: '{ Xd (Yd code + Zd test) — from SP table }'
man_days_ai: '{ Xd (Yd code + Zd test) — from SP table with AI }'
status: draft | refined | ready | in_progress | done
rice_score: { calculated }
created: '{YYYY-MM-DD}'
source_idea: '{idea artifact path or ID}'
---

# {PBI Title}

## Description

**As a** {user role}
**I want** {capability}
**So that** {business value}

## Business Value

- {Quantified benefit 1}
- {Quantified benefit 2}

## Problem Hypothesis

**We believe** {target users}
**Experience** {specific problem}
**Because** {root cause}
**We'll know this is true when** {validation metric}

## Business Rules

- BR-{MOD}-XXX: {Rule description}

## Acceptance Criteria

### AC-1: {Title}

Scenario: {Happy path}
Given {context}
When {action}
Then {outcome}

### AC-2: {Title}

Scenario: {Edge case}
Given {edge state}
When {action}
Then {handling}

### AC-3: {Title}

Scenario: {Error case}
Given {context}
When {invalid action}
Then error "{message}"

## Testability Assessment

| AC   | Test Outline       | Priority |
| ---- | ------------------ | -------- |
| AC-1 | {test description} | P0       |
| AC-2 | {test description} | P1       |

## Out of Scope

- {Explicitly excluded item 1}
- {Explicitly excluded item 2}

## Dependencies

| Dependency            | Type         | Description                    |
| --------------------- | ------------ | ------------------------------ |
| {PBI/service/feature} | must-before  | {Why this must be done first}  |
| {PBI/service/feature} | can-parallel | {Why this can run in parallel} |
| {PBI/service/feature} | blocked-by   | {What blocks this PBI}         |
| -                     | independent  | {No dependencies — first item} |

## Production Readiness Concerns

| Concern                | Required        | Notes                                   |
| ---------------------- | --------------- | --------------------------------------- |
| Code linting/analyzers | Yes/No/Existing | {tool preference or "scaffold default"} |
| Error handling setup   | Yes/No/Existing | {pattern: toast/inline/error-page}      |
| Loading indicators     | Yes/No/Existing | {pattern: spinner/skeleton/progress}    |
| Docker integration     | Yes/No/Existing | {scope: infra-only/full/none}           |
| CI/CD quality gates    | Yes/No/Existing | {coverage threshold, lint enforcement}  |
| Security scanning      | Yes/No/Existing | {dependency audit, SAST}                |

## Authorization & Access Control

| Role   | Can Create | Can Read | Can Update | Can Delete | Notes         |
| ------ | ---------- | -------- | ---------- | ---------- | ------------- |
| {Role} | ✅/❌      | ✅/❌    | ✅/❌      | ✅/❌      | {scope notes} |

**New permissions needed:** {Yes/No — list if yes}
**Multi-tenant isolation:** {Yes/No}

## Seed Data Requirements

| Data Type          | Description                          | Owner        | Required |
| ------------------ | ------------------------------------ | ------------ | -------- |
| Reference data     | {lookups, statuses, types}           | Application  | Yes/No   |
| Configuration data | {default settings}                   | Application  | Yes/No   |
| Test seed data     | {entities for integration tests}     | Test project | Yes/No   |
| Performance data   | {large-volume data for load testing} | Test tooling | Yes/No   |

> If no seed data needed: `N/A — no seed data required for this feature.`

## Data Migration

| Change                      | Type                                   | Backward Compatible | Reversible |
| --------------------------- | -------------------------------------- | ------------------- | ---------- |
| {schema change description} | Add field / Remove field / Type change | Yes/No              | Yes/No     |

> If no schema changes: `N/A — no schema changes required.`

## Domain Context

**Entities:** {Entity1}, {Entity2}
**Related Features:** {feature doc paths}

## UI Layout

### Wireframe

{ASCII wireframe using box-drawing characters}

**Layout:** {description with approximate proportions/dimensions}

### Components

- **{ComponentName}** — {behavior description} _(tier: common | domain-shared | page/app)_

> Classify per **Component Hierarchy** in UI wireframe protocol — search existing libs before proposing new components.

### States

| State   | Behavior                   |
| ------- | -------------------------- |
| Default | {what user sees initially} |
| Loading | {spinner/skeleton}         |
| Empty   | {empty state message}      |
| Error   | {error handling}           |

> If backend-only: `## UI Layout` → `N/A — Backend-only change. No UI affected.`

## Validation Summary

**Validated:** {date}

### Confirmed Decisions

- {decision}: {user choice}

### Assumptions Confirmed

- {assumption}: Confirmed/Modified

### Open Items

- [ ] {follow-up items}

Anti-Patterns to Avoid

Anti-Pattern	Better Approach
Refining vague ideas	Return to `$idea` for clarification
Skipping hypothesis validation	Always run Phase 3 for new features
Solution-first thinking	Start with problem, not solution
Generic acceptance criteria	Use GIVEN/WHEN/THEN with specifics
Ignoring domain context	Load project docs if applicable
Too large PBI (XL+)	Break into smaller items
Missing "Out of Scope"	Explicitly list exclusions
Assuming instead of asking	Run validation interview

Key Rules

Every PBI MUST ATTENTION include Dependencies table — types: must-before, can-parallel, blocked-by, independent. Enables $prioritize and $plan-hard to respect ordering.
No vague dependency descriptions — Each dependency must specify concrete PBI, service, or feature and WHY relationship exists.

Definition of Ready (INVEST)

Criterion	Check
Independent	No blocking dependencies
Negotiable	Details can be refined
Valuable	Clear user/business value
Estimable	Team can estimate (XS-XL)
Small	Single sprint
Testable	3+ GIVEN/WHEN/THEN scenarios
Problem Validated	Hypothesis confirmed
Domain Context	BR/entity context loaded
Stakeholder Aligned	Validation interview done
Prod Readiness	Concerns documented

Project Integration

For domain PBIs: detect module from docs/business-features/ directory names, extract business rules from docs/business-features/{module}/, load entity context from feature doc. Target 8-12K tokens for feature context.

Role Skill: business-analyst (detailed patterns)
Input: $idea output
Next Step: $story, $tdd-spec (Recommended for TDD), $design-spec
Prioritization: $prioritize

Next Steps

MANDATORY IMPORTANT MUST ATTENTION after completing this skill, use a direct user question to present these options. NEVER skip because task seems "simple" or "obvious":

"$why-review (Recommended)" — Validate design rationale, alternatives, risk assessment before $story or implementation
"$domain-analysis" — If PBI creates/modifies domain entities, model bounded contexts before writing stories
"$story" — Break PBI into implementable user stories
"$pbi-mockup" — Generate HTML mockup from PBI
"$tdd-spec" — If using TDD approach
"Skip, continue manually" — user decides

[IMPORTANT] Use task tracking to break ALL work into small tasks BEFORE starting. Simple tasks: ask user whether to skip.

External Memory: Complex/lengthy work → write findings to plans/reports/ — prevents context loss.

Evidence Gate: MANDATORY IMPORTANT MUST ATTENTION — every claim requires file:line proof or traced evidence, confidence >80% to act.

AI Mistake Prevention — Failure modes to avoid on every task:

Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path. When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site. Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path.

UI System Context — For ANY task touching .ts, .html, .scss, or .css files:

MUST ATTENTION READ before implementing:

docs/project-reference/frontend-patterns-reference.md — component base classes, stores, forms

docs/project-reference/scss-styling-guide.md — BEM methodology, SCSS variables, mixins, responsive

docs/project-reference/design-system/README.md — design tokens, component inventory, icons

Reference docs/project-config.json for project-specific paths.

Estimation Framework — Bottom-up first; SP DERIVED; output min-max range when likely ≥3d. Stack-agnostic. Baseline: 3-5yr dev, 6 productive hrs/day. AI estimate assumes Claude Code + project context.

Method:

Blast Radius pass (below) — drives code AND test cost

Decompose phases → hours/phase → bottom_up_hours = Σ phase_hours

likely_days = ceil(bottom_up_hours / 6) × productivity_factor

Sum Risk Margin (base + add-ons) → max_days = likely_days × (1 + margin)

min_days = likely_days × 0.9

Output as range when likely_days ≥3; single point allowed <3 (still record margin)

man_days_ai = same range × AI speedup

story_points DERIVED from likely_days via SP-Days — NEVER driver. Disagreement >50% → trust bottom-up

Productivity factor: 0.8 strong scaffolding+codegen+AI hooks · 1.0 mature default · 1.2 weak patterns · 1.5 greenfield

Cost Driver Heuristic (apply BEFORE work-type row):

UI dominates in CRUD/business apps — 1.5-3x backend (states, validation, responsive, a11y, polish)

Backend dominates ONLY: multi-aggregate invariants, cross-service contracts, schema migrations, heavy query/perf, new event flows

Reuse-vs-Create axis (PRIMARY lever, per layer):

UI tier Cost
Reuse component on existing screen 0.1-0.3d
Add control/column to existing screen 0.3-0.8d
Compose components into NEW screen 1-2d
NEW screen, custom layout/states/validation 2-4d
NEW shared/common component (themed, tested) 3-6d+

Backend tier Cost
Reuse query/handler from new place 0.1-0.3d
Small update existing handler/entity 0.3-0.8d
NEW query on existing repo/model 0.5-1d
NEW command/handler on existing aggregate (additive) 1-2d
NEW aggregate/entity (repo, validation, events) 2-4d
NEW cross-service contract OR schema migration 2-4d each
Multi-aggregate invariant / heavy domain rule 3-5d

Rule: Sum tiers across UI+backend+tests, apply productivity factor. Reuse short-circuits tiers — call out.

Test-Scope drivers (compute test_count EXPLICITLY — "+tests" hand-wave is #1 failure):

Driver Count
Happy-path journeys 1 per story / AC main flow
State-machine transitions reachable transitions × allowed actors
Multi-entity state combos state(A) × state(B) — REACHABLE only, not Cartesian
Authorization matrix (owner, non-owner, elevated, unauth) × each mutation
Validation rules 1 per required field / boundary / format / cross-field
UI states (per new screen/dialog) happy, loading, empty, error, partial — present only
Negative paths / invariants 1 per violatable business rule

Test tier (Trad, incl. setup+assert+flake) Cost
1-5 cases, fixtures reused 0.3-0.5d
6-12 cases, 1 new fixture 0.5-1d
13-25 cases, multi-entity setup 1-2d
26-50 cases OR new state-machine coverage 2-3d
>50 cases OR full E2E journey 3-5d

Test multipliers: new fixture/seed harness +0.5d · cross-service/bus assertion +0.3d each · UI E2E ×1.5 · each new role +1-2 cases

Blast Radius (mandatory pre-pass — affects code AND test):

Files/components directly modified — count

Of those, "complex" (>500 LOC, multi-handler, central, frequently-modified) — count

Downstream consumers (callers, event subscribers, cross-service) — list

Shared/common code touched (multi-app blast) — yes/no

Regression scope — areas needing re-test

Rule: Complex touch → add risk_factors. Each downstream consumer → +1-3 regression cases. Blast >5 areas OR >2 complex → re-evaluate SPLIT before estimating.

Risk Margin (drives max bound):

likely_days Base margin
<1d trivial +10%
1-2d small additive +20%
3-4d real feature +35%
5-7d large +50%
8-10d very large +75%
>10d +100% AND flag SHOULD SPLIT

Risk-factor add-ons (additive — enumerate in risk_factors):

Factor +margin
touches-complex-existing-feature (>500 LOC, multi-handler, central) +20%
cross-service-contract change +25%
schema-migration-on-populated-data +25%
new-tech-or-unfamiliar-pattern +30%
regression-fan-out (≥3 downstream areas re-test) +20%
performance-or-latency-critical +20%
concurrency-race-event-ordering +25%
shared-common-code (multi-consumer/multi-app) +25%
unclear-requirements-or-design +30%

Collapse rule: total margin >100% → STOP, split (padding past 2x is dishonesty). Margin <15% on likely_days ≥5 → under-estimated, widen.

Work-Type Caps (hard ceilings on likely_days):

Work type Max SP Max likely
Single field / config flag / style fix 1 0.5d
Add property to existing model + bind to existing UI 2 1d
Additive endpoint + minor UI control (button/menu/column), reuses fixtures 3 2-3d
Additive endpoint + NEW UI surface OR additive multi-layer + new domain rule + 2+ test files 5 3-5d
NEW model/aggregate OR migration OR cross-module contract OR heavy test (>1.5d) OR NEW UI + non-trivial backend 8 5-7d
NEW UI surface + (NEW aggregate OR migration OR cross-service contract) 13 SHOULD split
Cross-service contract + migration combined 13 SHOULD split
Beyond 21 MUST split

SP→Days (validation only): 1=0.5d/0.25d · 2=1d/0.35d · 3=2d/0.65d · 5=4d/1.0d · 8=6d/1.5d · 13=10d/2.0d (Trad/AI likely) AI speedup: SP 1≈2x · 2-3≈3x · 5-8≈4x · 13+≈5x. AI cost = (code_gen × 1.3) + (test_gen × 1.3) (30% review overhead).

MANDATORY frontmatter:
story_points: <n>
complexity: low | medium | high | critical
man_days_traditional: '<min>-<max>d' # range when likely ≥3d; '<N>d' when <3d
man_days_ai: '<min>-<max>d'
risk_margin_pct: <n> # base + add-ons
risk_factors: [touches-complex-existing-feature, regression-fan-out] # closed-list from add-ons; [] if none
blast_radius:
    touched_areas: <n>
    complex_touched: <n>
    downstream_consumers: [list or count]
    shared_common_code: yes | no
estimate_scope_included: [code, integration-tests, frontend, i18n, docs]
estimate_scope_excluded: [unit-tests, e2e, perf, deployment, code-review-rounds]
estimate_reasoning: |
    5-7 lines covering:
    (a) UI tier — row applied
    (b) Backend tier — row applied
    (c) Test scope — case breakdown by driver, file count, fixtures, tier row
    (d) Cost driver — dominant tier + why
    (e) Blast radius — touched, complex, regression scope
    (f) Risk factors — list driving margin; why not larger/smaller
    Example: "UI: compose Form/Table/Dialog → NEW screen (~1.5d). Backend: NEW command on existing aggregate,
    reuses validation+repo (~1d). Tests: 4 transitions × 2 actors + 3 validation + 2 UI states = 13 cases,
    1 new fixture → tier 13-25 ~1.5d. Driver: UI composition + new states. Blast: 4 areas, 1 complex.
    Risk: base 35% + touches-complex +20% = 55% → max 3.9d → range 2.5-4d."
Sanity self-check:

likely_days ≥3d and single-point? → reject, must be range

Margin <15% on likely_days ≥5d? → under-estimated, widen

Margin >100%? → STOP, split instead of buffer

Complex existing feature touched, no regression budget in (c)? → reject

Blast >5 areas OR >2 complex, no split discussion? → reject

Purely additive on existing model AND existing UI? → cap SP 3 unless tests >1.5d

NEW UI surface (page/complex form/dashboard)? → SP 5+ even if backend one endpoint

Backend cross-service / migration / multi-aggregate? → SP 8+ regardless of UI

bottom_up_hours / 6 vs SP-Days disagreement >50%? → trust bottom-up, downgrade SP

Without tests, SP drops ≥1 bucket? → tests dominate; state explicitly

Reasoning called out UI vs backend vs blast vs risk factors? → if missing, add

Scaffold Production Readiness — Every scaffolded project MUST ATTENTION include 5 foundations:

Code Quality Tooling — linting, formatting, pre-commit hooks, CI gates. Specific tool choices → docs/project-reference/ or project-config.json.

Error Handling Foundation — HTTP interceptor, error classification (4xx/5xx taxonomy), user notification, global uncaught handler.

Loading State Management — counter-based tracker (not boolean toggle), skip-token for background requests, 300ms flicker guard.

Docker Development Environment — compose profiles (dev/test/infra), multi-stage Dockerfile, health checks on all services, non-root production user.

Integration Points — document each outbound boundary; configure retry + circuit breaker + timeout; integration tests for happy path and failure path.

BLOCK $cook if any foundation is unchecked. Present 2-3 options per concern via a direct user question before implementing.

Cross-Cutting Quality — Check across all changed files:

Error handling consistency — same error patterns across related files

Logging — structured logging with correlation IDs for traceability

Security — no hardcoded secrets, input validation at boundaries, auth checks present

Performance — no N+1 queries, unnecessary allocations, or blocking calls in async paths

Observability — health checks, metrics, tracing spans for new endpoints

UI Wireframe — For UI artifacts: include ASCII wireframe (box-drawing chars), component tree with EXISTING/NEW classification and tier (common | domain-shared | page/app), interaction flow (user action → system response → UI update), states table (default/loading/empty/error), and responsive breakpoint behavior. Process Figma URLs or screenshots BEFORE wireframing. Search existing component libs before proposing new components. Backend-only changes: N/A — Backend-only change. No UI affected.

Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.

Sequential Thinking Protocol — Structured multi-step reasoning for complex/ambiguous work. Use when planning, reviewing, debugging, or refining ideas where one-shot reasoning is unsafe.

Trigger when: complex problem decomposition · adaptive plans needing revision · analysis with course correction · unclear/emerging scope · multi-step solutions · hypothesis-driven debugging · cross-cutting trade-off evaluation.

Format (explicit mode — visible thought trail):

Thought N/M: [aspect] — one aspect per thought, state assumptions/uncertainty

Thought N/M [REVISION of Thought K]: ... — when prior reasoning invalidated; state Original / Why revised / Impact

Thought N/M [BRANCH A from Thought K]: ... — explore alternative; converge with decision rationale

Thought N/M [HYPOTHESIS]: ... then [VERIFICATION]: ... — test before acting

Thought N/N [FINAL] — only when verified, all critical aspects addressed, confidence >80%

Mandatory closers: Confidence % stated · Assumptions listed · Open questions surfaced · Next action concrete.

Stop conditions: confidence <80% on any critical decision → escalate via ask the user directly · ≥3 revisions on same thought → re-frame the problem · branch count >3 → split into sub-task.

Implicit mode: apply methodology internally without visible markers when adding markers would clutter the response (routine work where reasoning aids accuracy).

Deep-dive: see $sequential-thinking skill (.claude/skills/sequential-thinking/SKILL.md) for worked examples (api-design, debug, architecture), advanced techniques (spiral refinement, hypothesis testing, convergence), and meta-strategies (uncertainty handling, revision cascades).

MANDATORY MUST ATTENTION estimation: bottom-up phase hours drive man_days_traditional (Σh/6 × productivity_factor); SP DERIVED. UI cost usually dominates — bump SP one bucket if NEW UI surface (page/complex form/dashboard). Frontmatter MUST include story_points, complexity, man_days_traditional, man_days_ai, estimate_scope_included, estimate_scope_excluded, estimate_reasoning (UI vs backend cost driver). Cap SP 3 for additive-on-existing-model+existing-UI unless test scope >1.5d. SP 13 SHOULD split, SP 21 MUST split.

IMPORTANT MUST ATTENTION read frontend-patterns-reference, scss-styling-guide, design-system/README before any UI change.

IMPORTANT MUST ATTENTION verify 5 production-readiness foundations (code quality, error handling, loading state, Docker, integration points) for scaffold PBIs.

IMPORTANT MUST ATTENTION check error handling, logging, security, performance, and observability across changed files.

MUST ATTENTION apply critical thinking — every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact.

MUST ATTENTION apply sequential-thinking — multi-step Thought N/M, REVISION/BRANCH/HYPOTHESIS markers, confidence % closer; see $sequential-thinking skill.

MUST ATTENTION apply AI mistake prevention — holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction.

Prompt-Enhance Closing Anchors

IMPORTANT MUST ATTENTION follow declared step order for this skill; NEVER skip, reorder, or merge steps without explicit user approval IMPORTANT MUST ATTENTION for every step/sub-skill call: set in_progress before execution, set completed after execution IMPORTANT MUST ATTENTION every skipped step MUST include explicit reason; every completed step MUST include concise evidence IMPORTANT MUST ATTENTION if Task tools unavailable, maintain an equivalent step-by-step plan tracker with synchronized statuses

Closing Reminders

MANDATORY IMPORTANT MUST ATTENTION break work into small tasks via task tracking BEFORE starting
MANDATORY IMPORTANT MUST ATTENTION validate decisions with user via a direct user question — NEVER auto-decide
MANDATORY IMPORTANT MUST ATTENTION add final review task to verify work quality
MANDATORY IMPORTANT MUST ATTENTION add task: run $why-review — validate PBI design rationale before $story or $tdd-spec
MANDATORY IMPORTANT MUST ATTENTION add task: run $pbi-challenge — Dev BA PIC review before $dor-gate or $story

Anti-Rationalization:

Evasion	Rebuttal
"Simple PBI, skip hypothesis validation"	Wrong assumption wastes more time than validation check. Apply always.
"Validation interview is optional here"	NEVER optional — user decides assumptions, AI doesn't
"Skip Dependencies table, no blockers"	Unknown blockers exist. Always fill table — even if `independent`
"Skip story points, just write ACs"	`story_points`, `man_days_traditional`, `man_days_ai` mandatory in frontmatter
"Domain context not needed for small PBI"	Small PBIs touch entities. Read domain-entities-reference first

[TASK-PLANNING] Before acting, analyze task scope and systematically break it into small todo tasks and sub-tasks using task tracking.

Hookless Prompt Protocol Mirror (Auto-Synced)

Source: .claude/hooks/lib/prompt-injections.cjs + .claude/.ck.json

[WORKFLOW-EXECUTION-PROTOCOL] [BLOCKING] Workflow Execution Protocol — MANDATORY IMPORTANT MUST CRITICAL. Do not skip for any reason.

DETECT: Match prompt against workflow catalog
ANALYZE: Find best-match workflow AND evaluate if a custom step combination would fit better
ASK (REQUIRED FORMAT): Use a direct user question with this structure:
- Question: "Which workflow do you want to activate?"
- Option 1: "Activate [BestMatch Workflow] (Recommended)"
- Option 2: "Activate custom workflow: [step1 → step2 → ...]" (include one-line rationale)
ACTIVATE (if confirmed): Call $workflow-start <workflowId> for standard; sequence custom steps manually
CREATE TASKS: task tracking for ALL workflow steps
EXECUTE: Follow each step in sequence [CRITICAL-THINKING-MINDSET] Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination principle: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination. AI Attention principle (Primacy-Recency): Put the 3 most critical rules at both top and bottom of long prompts/protocols so instruction adherence survives long context windows.

Learned Lessons

Lessons Learned

[CRITICAL] Hard-won project debugging/architecture rules. MUST ATTENTION apply BEFORE forming hypothesis or writing code.

Quick Summary

Goal: Prevent recurrence of known failure patterns — debugging, architecture, naming, AI orchestration, environment.

Top Rules (apply always):

MUST ATTENTION verify ALL preconditions (config, env, DB names, DI regs) BEFORE code-layer hypothesis
MUST ATTENTION fix responsible layer — NEVER patch symptom sites with caller-specific defensive code
MUST ATTENTION use ExecuteInjectScopedAsync for parallel async + repo/UoW — NEVER ExecuteUowTask
MUST ATTENTION name by PURPOSE not CONTENT — adding member forces rename = abstraction broken
MUST ATTENTION persist sub-agent findings incrementally after each file — NEVER batch at end
MUST ATTENTION Windows bash: verify Python alias (where python/where py) — NEVER assume python/python3 resolves

Debugging & Root Cause Reasoning

[2026-04-11] Holistic-first: verify environment before code. Failure → list ALL preconditions (config, env vars, DB names, endpoints, DI regs, credentials, permissions, data prerequisites) → verify each via evidence (grep/cat/query) BEFORE code-layer hypothesis. Worst rabbit holes: diving nearest layer while bug sits elsewhere — e.g., hours debugging "sync timeout", real cause: test appsettings pointing wrong DB. Cheapest check first.
[2026-04-01] Ask "whose responsibility?" before fixing. Trace: bug in caller (wrong data) or callee (wrong handling)? Fix responsible layer — NEVER patch symptom site masking real issue.
[2026-04-01] Trace data lifecycle, not error site. Follow data: creation → transformation → consumption. Bug usually where data created wrong, not consumed.
[2026-04-01] Code is caller-agnostic. Functions/handlers/consumers don't know who invokes them. Comments/guards/messages describe business intent — NEVER reference specific callers (tests, seeders, scripts).

Architecture Invariants

[2026-03-31] ParallelAsync + repo/UoW MUST use ExecuteInjectScopedAsync, NEVER ExecuteUowTask. ExecuteUowTask creates new UoW but reuses outer DI scope (same DbContext) — parallel iterations sharing non-thread-safe DbContext silently corrupt data. ExecuteInjectScopedAsync creates new UoW + new DI scope (fresh repo per iteration).
[2026-03-31] Bus message naming MUST include service name prefix — core services NEVER consume feature events. Prefix declares schema ownership (AccountUserEntityEventBusMessage = Accounts owns). Core services (Accounts, Communication) are leaders. Feature services (Growth, Talents) sending to core MUST use {CoreServiceName}...RequestBusMessage — never define own event for core to consume.

Naming & Abstraction

[2026-04-12] Name PURPOSE not CONTENT — "OrXxx" anti-pattern. HrManagerOrHrOrPayrollHrOperationsPolicy names set members, not what it guards. Add role → rename = broken abstraction. Rule: names express DOES/GUARDS, not CONTAINS. Test: adding/removing member forces rename? YES = content-driven = bad → rename to purpose (e.g., HrOperationsAccessPolicy). Nuance: "Or" fine in behavioral idioms (FirstOrDefault, SuccessOrThrow) — expresses HAPPENS, not membership.

Environment & Tooling

[2026-04-20] Windows bash: NEVER assume python/python3 resolves — verify alias first. Python may not be in bash PATH under those names. Check: where python / where py. Prefer py (Windows Python Launcher) for one-liners, node if JS alternative exists.

Test-specific lessons → docs/project-reference/integration-test-reference.md Lessons Learned section. Production-code anti-patterns → docs/project-reference/backend-patterns-reference.md Anti-Patterns section. Generic debugging/refactoring reminders → System Lessons in .claude/hooks/lib/prompt-injections.cjs.

Closing Reminders

IMPORTANT MUST ATTENTION holistic-first: verify ALL preconditions (config, env, DB names, endpoints, DI regs) BEFORE code-layer hypothesis — cheapest check first
IMPORTANT MUST ATTENTION fix responsible layer — NEVER patch symptom site; trace caller (wrong data) vs callee (wrong handling), fix root owner
IMPORTANT MUST ATTENTION parallel async + repo/UoW → ALWAYS ExecuteInjectScopedAsync, NEVER ExecuteUowTask (shared DbContext = silent data corruption)
IMPORTANT MUST ATTENTION bus message prefix = schema ownership; feature services NEVER define events for core services — use {CoreServiceName}...RequestBusMessage
IMPORTANT MUST ATTENTION name by PURPOSE — adding/removing member forces rename = broken abstraction
IMPORTANT MUST ATTENTION sub-agents MUST write findings after each file/section — NEVER batch all findings into one final write
IMPORTANT MUST ATTENTION Windows bash: NEVER assume python/python3 resolves — run where python/where py first, use py launcher or node

[LESSON-LEARNED-REMINDER] [BLOCKING] Task Planning & Continuous Improvement — MANDATORY. Do not skip.

Break work into small tasks (task tracking) before starting. Add final task: "Analyze AI mistakes & lessons learned".

Extract lessons — ROOT CAUSE ONLY, not symptom fixes:

Name the FAILURE MODE (reasoning/assumption failure), not symptom — "assumed API existed without reading source" not "used wrong enum value".
Generality test: does this failure mode apply to ≥3 contexts/codebases? If not, abstract one level up.
Write as a universal rule — strip project-specific names/paths/classes. Useful on any codebase.
Consolidate: multiple mistakes sharing one failure mode → ONE lesson.
Recurrence gate: "Would this recur in future session WITHOUT this reminder?" — No → skip $learn.
Auto-fix gate: "Could $code-review/$code-simplifier/$security/$lint catch this?" — Yes → improve review skill instead.
BOTH gates pass → ask user to run $learn. [TASK-PLANNING] [MANDATORY] BEFORE executing any workflow or skill step, create/update task tracking for all planned steps, then keep it synchronized as each step starts/completes.

name	refine
description	[Project Management] Use when converting ideas to PBIs, validating problem hypotheses, adding acceptance criteria, or refining requirements.

Codex compatibility note:

Invoke repository skills with $skill-name in Codex; this mirrored copy rewrites legacy Claude /skill-name references.

Prefer the plan-hard skill for planning guidance in this Codex mirror.

Task tracker mandate: BEFORE executing any workflow or skill step, create/update task tracking for all steps and keep it synchronized as progress changes.

User-question prompts mean to ask the user directly in Codex.

Ignore Claude-specific mode-switch instructions when they appear.

Strict execution contract: when a user explicitly invokes a skill, execute that skill protocol as written.

Subagent authorization: when a skill is user-invoked or AI-detected and its protocol requires subagents, that skill activation authorizes use of the required spawn_agent subagent(s) for that task.

Do not skip, reorder, or merge protocol steps unless the user explicitly approves the deviation first.

For workflow skills, execute each listed child-skill step explicitly and report step-by-step evidence.

If a required step/tool cannot run in this environment, stop and ask the user before adapting.

Codex Project-Reference Loading (No Hooks)

Codex does not receive Claude hook-based doc injection. When coding, planning, debugging, testing, or reviewing, open project docs explicitly using this routing.

Always read:

docs/project-config.json (project-specific paths, commands, modules, and workflow/test settings)
docs/project-reference/docs-index-reference.md (routes to the full docs/project-reference/* catalog)
docs/project-reference/lessons.md (always-on guardrails and anti-patterns)

Situation-based docs:

Backend/CQRS/API/domain/entity changes: backend-patterns-reference.md, domain-entities-reference.md, project-structure-reference.md
Frontend/UI/styling/design-system: frontend-patterns-reference.md, scss-styling-guide.md, design-system/README.md
Spec/test-case planning or TC mapping: feature-docs-reference.md
Integration test implementation/review: integration-test-reference.md
E2E test implementation/review: e2e-test-reference.md
Code review/audit work: code-review-rules.md plus domain docs above based on changed files

Do not read all docs blindly. Start from docs-index-reference.md, then open only relevant files for the task.

[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set in_progress when step starts, set completed when step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.

Quick Summary

Goal: Transform raw ideas into actionable PBIs using BA best practices, hypothesis validation, domain research.

Workflow:

Phase	Name	Key Activity	Output
1	Idea Intake	Load artifact, detect module	Context loaded
2	Domain Research	WebSearch market/competitors	Research summary
3	Problem Hypothesis	Validate problem exists	Confirmed hypothesis
4	Elicitation	Apply BABOK techniques	Requirements extracted
5	Acceptance Criteria	Write BDD scenarios	GIVEN/WHEN/THEN
6	Prioritization	Apply RICE/MoSCoW + Story Points	Priority + estimate
7	Validation	Interview user (MANDATORY)	Assumptions confirmed
8	PBI Generation	Create artifact	PBI file saved

Key Rules:

NEVER skip hypothesis validation for new features
Validation interview NOT optional — always ask 3-5 questions
Use project domain-specific vocabulary when available
MUST ATTENTION include story_points, complexity, man_days_traditional, man_days_ai in PBI frontmatter
Every PBI MUST ATTENTION include Dependencies table — types: must-before | can-parallel | blocked-by | independent
docs/specs/ — read existing TCs for related features; recommend test spec generation for new PBIs
docs/project-reference/domain-entities-reference.md — read when task involves business entities/models

Frontend/UI Context (if applicable)

Component patterns: docs/project-reference/frontend-patterns-reference.md (read directly when relevant; do not rely on hook-injected conversation text)
Styling/BEM guide: docs/project-reference/scss-styling-guide.md
Design system tokens: docs/project-reference/design-system/README.md

Greenfield Mode

Auto-detected: No code directories (src/, app/, lib/, server/, packages/, etc.) and no manifest files (package.json/*.sln/go.mod) found. Planning artifacts (docs/, plans/, .claude/) don't count.

When greenfield detected:

Skip existing backlog refinement (no backlog exists)
Enable DDD domain modeling: bounded contexts, aggregates, entities, value objects
Capture constraints: team skills, expected scale, hosting preferences, budget — as input signals only
Use WebSearch for market research + competitor analysis
Output domain model artifact alongside PBI artifact
Increase ask the user directly frequency — validate domain boundaries, entity relationships, business rules
[CRITICAL] NEVER ask about tech stack during refinement. Tech stack decided after business analysis. Capture team skills + scale expectations as signals only.

Be skeptical. Every claim needs traced proof, confidence >80%.

Phase 0: Locate Active Plan (if in workflow)

If running in workflow (big-feature, greenfield-init, etc.):

Glob plans/*$plan-hard.md sorted by modification time, or check the current task list for plan context
Read plan.md — project scope, goals, architecture decisions, domain model
Read existing research — {plan-dir}$research/*.md for business evaluation, domain analysis
Read docs/project-reference/domain-entities-reference.md (if exists) — existing domain entities
Use plan context — don't re-ask questions already answered in prior steps

Phase 1: Idea Intake & Context Loading

Read idea artifact from path or find by ID in team-artifacts/ideas/
Extract: problem statement, value proposition, target users, scope
Check module field; if absent, detect via keywords or prompt user

Phase 2: Domain Research

Trigger: New domain, unclear competitors, --research flag. Skip: Internal tooling, well-understood domain, time-constrained.

Use WebSearch with domain terms. Summarize in max 3 bullets (market context, competitors, best practices).

Phase 3: Problem Hypothesis Validation

Validate hypothesis with user via ask the user directly. 42% of startups fail due to no market need — validate before building.

Skip: --skip-hypothesis, validated hypothesis exists, bug fix/tech debt.

Problem Hypothesis Template

**We believe** [target users/persona]
**Experience** [specific problem]
**Because** [root cause]
**We'll know this is true when** [validation metric/evidence]

Value Hypothesis Template

**We believe** [feature/solution]
**Will deliver** [value/benefit]
**To** [target users]
**We'll know we're right when** [success metric]

Validation Process

Draft hypothesis from idea content
Use ask the user directly to validate:
- "Is this the core problem we're solving?"
- "Who exactly experiences this? How often?"
- "What evidence do we have this problem exists?"
Validated → proceed to elicitation
Invalidated → return idea for clarification

Phase 4: Requirements Elicitation (BABOK Core 5)

Think: What information gaps exist? Which technique fills them with least effort + highest confidence?

Technique	When to Choose	What to Extract
Interviews	Deep insights needed, stakeholder perspectives vary	Stakeholder needs, pain points, constraints
Workshops	Group consensus needed, multiple stakeholders	Prioritized requirements, consensus decisions
Document Analysis	Existing systems/processes, regulatory requirements	As-is state, compliance requirements, gaps
Observation	Users can't articulate needs, workflow unclear	Actual vs stated workflow, hidden requirements
Prototyping	Visual validation needed, UI/UX requirements vague	Validated UI requirements, interaction patterns

Technique notes:

Interviews: Open-ended questions (why, how, what-if) → active listening → follow-up on unexpected → document verbatim quotes
Workshops: Define agenda + 90 min timebox → neutral facilitator → round-robin/silent voting → document decisions AND dissent
Observation: Shadow users → note workarounds/pain points → don't interrupt → ask clarifying questions afterward

Phase 5: Acceptance Criteria (BDD Format)

Write GIVEN/WHEN/THEN scenarios. Minimum 3: happy path, edge case, error case.

Scenario: {Descriptive title}
  Given {precondition/context}
    And {additional context}
  When {action/trigger}
    And {additional action}
  Then {expected outcome}
    And {additional verification}

Practice	Rule
Single trigger	"When" clause has ONE action
3 scenarios minimum	Happy path, edge case, error case
No implementation details	Behavior, not how
Testable outcomes	"Then" must be verifiable
Stakeholder language	No technical jargon

Example Scenarios

Scenario: Employee creates goal with valid data
  Given employee has permission to create goals
    And employee is on the goal creation page
  When employee submits goal form with all required fields
  Then goal is created with status "Draft"
    And goal appears in employee's goal list

Scenario: Goal creation fails with missing required field
  Given employee is on the goal creation page
  When employee submits form without title
  Then validation error "Title is required" is displayed
    And goal is not created

Scenario: Manager reviews subordinate goal
  Given manager has direct reports
    And subordinate has submitted goal for review
  When manager opens goal review page
  Then subordinate's goal is visible with "Pending Review" status

Project Test Case Format

Format: TC-{FEATURE}-{NNN} (e.g., TC-GM-001)
Evidence: file:line format
See business-analyst skill for detailed patterns

Phase 5.5: Testability Assessment

Use a direct user question with 2-3 questions:

"Which testing approach fits this PBI?"
- TDD-first: Write test specs before implementation (Recommended for complex features)
- Implement-first: Build feature, then create test specs
- Parallel: Spec and implement simultaneously
"What test levels are needed?"
- Integration tests only (Recommended for backend CQRS)
- Integration + E2E
- Unit + Integration + E2E

For EACH acceptance criterion, generate corresponding test case outline:

AC	Test Outline	Priority
AC-1	TC: Create goal with valid data → verify persisted	P0
AC-2	TC: Create goal without title → verify validation error	P1

Seed for $tdd-spec if user chooses TDD-first. Document in PBI under ## Testability Assessment.

Phase 6: Prioritization & Estimation

Apply RICE score or MoSCoW for priority. Estimate using Story Points (Modified Fibonacci 1-21).

Quick RICE Score

Score = (Reach x Impact x Confidence) / Effort

Reach: Users affected per quarter (100, 500, 1000+)
Impact: 0.25 (minimal) | 0.5 (low) | 1 (medium) | 2 (high) | 3 (massive)
Confidence: 0.5 (low) | 0.8 (medium) | 1.0 (high)
Effort: Story points (1, 2, 3, 5, 8, 13, 21)

MoSCoW Categories

Category	Meaning	Action
Must Have	Critical, non-negotiable	Include in MVP
Should Have	Important but not vital	Plan for release
Could Have	Nice to have, low effort	If time permits
Won't Have	Out of scope this cycle	Document for future

Phase 7: Validation Interview (MANDATORY)

Generate 3-5 questions covering assumptions, scope, dependencies, edge cases. Use ask the user directly. Document in PBI. NOT optional.

Category	Example Question
Assumptions	"We assume X is true. Correct?"
Scope	"Should Y be included or explicitly excluded?"
Dependencies	"This requires Z. Is that available?"
Edge Cases	"What happens when data is empty/null?"
Business Impact	"Will this affect existing reports/workflows?"
Entities	"Create new entity or extend existing X?"
Prod Readiness	"Does this feature need linting, error handling, loading, or Docker setup?"
Authorization	"Who can perform this action? What roles/permissions are needed?"
Seed Data	"Does this feature need reference/lookup data to function?"
Data Migration	"Does this change entity schema? Is data transformation needed?"

Generate 3-5 questions from assumptions, scope, dependencies
Use a direct user question to interview
Document in PBI under ## Validation Summary
Update PBI based on answers

Validation Output Format

## Validation Summary

**Validated:** {date}

### Confirmed Decisions

- {decision}: {user choice}

### Assumptions Confirmed

- {assumption}: Confirmed/Modified

### Open Items

- [ ] {follow-up items}

Cross-Cutting & Production Readiness

Capture in PBI template sections: Production Readiness Concerns, Authorization & Access Control, Seed Data, Data Migration.

Phase 7.5: Re-evaluate Estimation (MANDATORY — runs after Validation Interview)

Why this phase exists: Phase 6 estimation runs against a draft scope. Phase 7 (Validation Interview) and Cross-Cutting capture often resolve unknowns, add constraints, or trim/expand scope. The numbers in story_points, complexity, man_days_traditional, man_days_ai MUST be re-derived against the locked scope BEFORE Phase 8 writes them into the PBI frontmatter. Estimating once at draft and forgetting is the #1 source of estimation drift in PBIs.

Inputs (locked by end of Phase 7)

Confirmed assumptions, scope inclusions/exclusions
Authorization, seed data, migration, prod-readiness decisions
Newly discovered dependencies or edge cases
Any rescoping the user requested during validation

Re-derive (per `SYNC:estimation-framework`)

Walk the locked scope acceptance criteria + cross-cutting concerns; assign hours per slice.
bottom_up_hours = Σ slice_hours (use the SP table mapping in Phase 6, not eyeballing).
likely_days = ceil(bottom_up_hours / 6) × productivity factor for the team/AI mode.
Recompute risk_margin_pct based on remaining unknowns AFTER Phase 7 (margin should usually shrink because validation removed unknowns; rises only if new risks surfaced).
Recompute min-max range from the new likely_days ± margin.
Re-pick the closest Fibonacci story_points and complexity bucket from the re-derived likely_days.

Compare against Phase 6 draft estimate

Compute delta_pct = (new_likely_days - draft_likely_days) / draft_likely_days × 100.

Delta	Action
`\|delta\| ≤ 20%`	Keep draft estimate. Note `reestimate_delta_pct: <signed>` + `reestimate_reason: "within tolerance, no change"` in PBI frontmatter for transparency.
`\|delta\| > 20%`	UPDATE `story_points`, `complexity`, `man_days_traditional`, `man_days_ai`. Add `reestimate_delta_pct: <signed>` + 1-line `reestimate_reason` explaining what changed (e.g., "auth scope confirmed wider", "seed data dropped per validation").
`\|delta\| > 50%`	UPDATE values AND flag `SHOULD-RESCOPE`. Surface to user via a direct user question BEFORE Phase 8 writes the PBI: "Re-estimate is +/-X% vs original. Options: (a) accept new estimate as-is, (b) split into 2 PBIs, (c) trim scope back to original estimate, (d) defer." Record the user's decision in `## Validation Summary`.

Output

Updated estimation values (carry into Phase 8 frontmatter)
New frontmatter fields: reestimate_delta_pct, reestimate_reason (always populate even when within tolerance — creates a paper trail for retrospective comparison against actual implementation time)
If rescoped: updated acceptance criteria/scope sections reflecting the user's choice

DO NOT skip this phase. A PBI that ships with stale Phase 6 estimates is the source of unreliable velocity data. The whole point is to make the post-validation numbers — not the pre-validation guesses — the ones the team commits to.

Phase 8: PBI Artifact Generation

Path: team-artifacts/pbis/{YYMMDD}-pbi-{slug}.md | ID Pattern: PBI-{YYMMDD}-{NNN}

PBI Template

---
id: PBI-{YYMMDD}-{NNN}
title: '{Brief descriptive title}'
module: '{ModuleName — detect from project-config.json modules[]}'
priority: Must Have | Should Have | Could Have | Won't Have
story_points: 1 | 2 | 3 | 5 | 8 | 13 | 21
complexity: Low | Medium | High | Very High
man_days_traditional: '{ Xd (Yd code + Zd test) — from SP table }'
man_days_ai: '{ Xd (Yd code + Zd test) — from SP table with AI }'
status: draft | refined | ready | in_progress | done
rice_score: { calculated }
created: '{YYYY-MM-DD}'
source_idea: '{idea artifact path or ID}'
---

# {PBI Title}

## Description

**As a** {user role}
**I want** {capability}
**So that** {business value}

## Business Value

- {Quantified benefit 1}
- {Quantified benefit 2}

## Problem Hypothesis

**We believe** {target users}
**Experience** {specific problem}
**Because** {root cause}
**We'll know this is true when** {validation metric}

## Business Rules

- BR-{MOD}-XXX: {Rule description}

## Acceptance Criteria

### AC-1: {Title}

Scenario: {Happy path}
Given {context}
When {action}
Then {outcome}

### AC-2: {Title}

Scenario: {Edge case}
Given {edge state}
When {action}
Then {handling}

### AC-3: {Title}

Scenario: {Error case}
Given {context}
When {invalid action}
Then error "{message}"

## Testability Assessment

| AC   | Test Outline       | Priority |
| ---- | ------------------ | -------- |
| AC-1 | {test description} | P0       |
| AC-2 | {test description} | P1       |

## Out of Scope

- {Explicitly excluded item 1}
- {Explicitly excluded item 2}

## Dependencies

| Dependency            | Type         | Description                    |
| --------------------- | ------------ | ------------------------------ |
| {PBI/service/feature} | must-before  | {Why this must be done first}  |
| {PBI/service/feature} | can-parallel | {Why this can run in parallel} |
| {PBI/service/feature} | blocked-by   | {What blocks this PBI}         |
| -                     | independent  | {No dependencies — first item} |

## Production Readiness Concerns

| Concern                | Required        | Notes                                   |
| ---------------------- | --------------- | --------------------------------------- |
| Code linting/analyzers | Yes/No/Existing | {tool preference or "scaffold default"} |
| Error handling setup   | Yes/No/Existing | {pattern: toast/inline/error-page}      |
| Loading indicators     | Yes/No/Existing | {pattern: spinner/skeleton/progress}    |
| Docker integration     | Yes/No/Existing | {scope: infra-only/full/none}           |
| CI/CD quality gates    | Yes/No/Existing | {coverage threshold, lint enforcement}  |
| Security scanning      | Yes/No/Existing | {dependency audit, SAST}                |

## Authorization & Access Control

| Role   | Can Create | Can Read | Can Update | Can Delete | Notes         |
| ------ | ---------- | -------- | ---------- | ---------- | ------------- |
| {Role} | ✅/❌      | ✅/❌    | ✅/❌      | ✅/❌      | {scope notes} |

**New permissions needed:** {Yes/No — list if yes}
**Multi-tenant isolation:** {Yes/No}

## Seed Data Requirements

| Data Type          | Description                          | Owner        | Required |
| ------------------ | ------------------------------------ | ------------ | -------- |
| Reference data     | {lookups, statuses, types}           | Application  | Yes/No   |
| Configuration data | {default settings}                   | Application  | Yes/No   |
| Test seed data     | {entities for integration tests}     | Test project | Yes/No   |
| Performance data   | {large-volume data for load testing} | Test tooling | Yes/No   |

> If no seed data needed: `N/A — no seed data required for this feature.`

## Data Migration

| Change                      | Type                                   | Backward Compatible | Reversible |
| --------------------------- | -------------------------------------- | ------------------- | ---------- |
| {schema change description} | Add field / Remove field / Type change | Yes/No              | Yes/No     |

> If no schema changes: `N/A — no schema changes required.`

## Domain Context

**Entities:** {Entity1}, {Entity2}
**Related Features:** {feature doc paths}

## UI Layout

### Wireframe

{ASCII wireframe using box-drawing characters}

**Layout:** {description with approximate proportions/dimensions}

### Components

- **{ComponentName}** — {behavior description} _(tier: common | domain-shared | page/app)_

> Classify per **Component Hierarchy** in UI wireframe protocol — search existing libs before proposing new components.

### States

| State   | Behavior                   |
| ------- | -------------------------- |
| Default | {what user sees initially} |
| Loading | {spinner/skeleton}         |
| Empty   | {empty state message}      |
| Error   | {error handling}           |

> If backend-only: `## UI Layout` → `N/A — Backend-only change. No UI affected.`

## Validation Summary

**Validated:** {date}

### Confirmed Decisions

- {decision}: {user choice}

### Assumptions Confirmed

- {assumption}: Confirmed/Modified

### Open Items

- [ ] {follow-up items}

Anti-Patterns to Avoid

Anti-Pattern	Better Approach
Refining vague ideas	Return to `$idea` for clarification
Skipping hypothesis validation	Always run Phase 3 for new features
Solution-first thinking	Start with problem, not solution
Generic acceptance criteria	Use GIVEN/WHEN/THEN with specifics
Ignoring domain context	Load project docs if applicable
Too large PBI (XL+)	Break into smaller items
Missing "Out of Scope"	Explicitly list exclusions
Assuming instead of asking	Run validation interview

Key Rules

Every PBI MUST ATTENTION include Dependencies table — types: must-before, can-parallel, blocked-by, independent. Enables $prioritize and $plan-hard to respect ordering.
No vague dependency descriptions — Each dependency must specify concrete PBI, service, or feature and WHY relationship exists.

Definition of Ready (INVEST)

Criterion	Check
Independent	No blocking dependencies
Negotiable	Details can be refined
Valuable	Clear user/business value
Estimable	Team can estimate (XS-XL)
Small	Single sprint
Testable	3+ GIVEN/WHEN/THEN scenarios
Problem Validated	Hypothesis confirmed
Domain Context	BR/entity context loaded
Stakeholder Aligned	Validation interview done
Prod Readiness	Concerns documented

Project Integration

Role Skill: business-analyst (detailed patterns)
Input: $idea output
Next Step: $story, $tdd-spec (Recommended for TDD), $design-spec
Prioritization: $prioritize

Next Steps

MANDATORY IMPORTANT MUST ATTENTION after completing this skill, use a direct user question to present these options. NEVER skip because task seems "simple" or "obvious":

"$why-review (Recommended)" — Validate design rationale, alternatives, risk assessment before $story or implementation
"$domain-analysis" — If PBI creates/modifies domain entities, model bounded contexts before writing stories
"$story" — Break PBI into implementable user stories
"$pbi-mockup" — Generate HTML mockup from PBI
"$tdd-spec" — If using TDD approach
"Skip, continue manually" — user decides

[IMPORTANT] Use task tracking to break ALL work into small tasks BEFORE starting. Simple tasks: ask user whether to skip.

External Memory: Complex/lengthy work → write findings to plans/reports/ — prevents context loss.

Evidence Gate: MANDATORY IMPORTANT MUST ATTENTION — every claim requires file:line proof or traced evidence, confidence >80% to act.

AI Mistake Prevention — Failure modes to avoid on every task:

Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path. When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site. Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path.

UI System Context — For ANY task touching .ts, .html, .scss, or .css files:

MUST ATTENTION READ before implementing:

docs/project-reference/frontend-patterns-reference.md — component base classes, stores, forms

docs/project-reference/scss-styling-guide.md — BEM methodology, SCSS variables, mixins, responsive

docs/project-reference/design-system/README.md — design tokens, component inventory, icons

Reference docs/project-config.json for project-specific paths.

Estimation Framework — Bottom-up first; SP DERIVED; output min-max range when likely ≥3d. Stack-agnostic. Baseline: 3-5yr dev, 6 productive hrs/day. AI estimate assumes Claude Code + project context.

Method:

Blast Radius pass (below) — drives code AND test cost

Decompose phases → hours/phase → bottom_up_hours = Σ phase_hours

likely_days = ceil(bottom_up_hours / 6) × productivity_factor

Sum Risk Margin (base + add-ons) → max_days = likely_days × (1 + margin)

min_days = likely_days × 0.9

Output as range when likely_days ≥3; single point allowed <3 (still record margin)

man_days_ai = same range × AI speedup

story_points DERIVED from likely_days via SP-Days — NEVER driver. Disagreement >50% → trust bottom-up

Productivity factor: 0.8 strong scaffolding+codegen+AI hooks · 1.0 mature default · 1.2 weak patterns · 1.5 greenfield

Cost Driver Heuristic (apply BEFORE work-type row):

UI dominates in CRUD/business apps — 1.5-3x backend (states, validation, responsive, a11y, polish)

Backend dominates ONLY: multi-aggregate invariants, cross-service contracts, schema migrations, heavy query/perf, new event flows

Reuse-vs-Create axis (PRIMARY lever, per layer):

UI tier Cost
Reuse component on existing screen 0.1-0.3d
Add control/column to existing screen 0.3-0.8d
Compose components into NEW screen 1-2d
NEW screen, custom layout/states/validation 2-4d
NEW shared/common component (themed, tested) 3-6d+

Backend tier Cost
Reuse query/handler from new place 0.1-0.3d
Small update existing handler/entity 0.3-0.8d
NEW query on existing repo/model 0.5-1d
NEW command/handler on existing aggregate (additive) 1-2d
NEW aggregate/entity (repo, validation, events) 2-4d
NEW cross-service contract OR schema migration 2-4d each
Multi-aggregate invariant / heavy domain rule 3-5d

Rule: Sum tiers across UI+backend+tests, apply productivity factor. Reuse short-circuits tiers — call out.

Test-Scope drivers (compute test_count EXPLICITLY — "+tests" hand-wave is #1 failure):

Driver Count
Happy-path journeys 1 per story / AC main flow
State-machine transitions reachable transitions × allowed actors
Multi-entity state combos state(A) × state(B) — REACHABLE only, not Cartesian
Authorization matrix (owner, non-owner, elevated, unauth) × each mutation
Validation rules 1 per required field / boundary / format / cross-field
UI states (per new screen/dialog) happy, loading, empty, error, partial — present only
Negative paths / invariants 1 per violatable business rule

Test tier (Trad, incl. setup+assert+flake) Cost
1-5 cases, fixtures reused 0.3-0.5d
6-12 cases, 1 new fixture 0.5-1d
13-25 cases, multi-entity setup 1-2d
26-50 cases OR new state-machine coverage 2-3d
>50 cases OR full E2E journey 3-5d

Test multipliers: new fixture/seed harness +0.5d · cross-service/bus assertion +0.3d each · UI E2E ×1.5 · each new role +1-2 cases

Blast Radius (mandatory pre-pass — affects code AND test):

Files/components directly modified — count

Of those, "complex" (>500 LOC, multi-handler, central, frequently-modified) — count

Downstream consumers (callers, event subscribers, cross-service) — list

Shared/common code touched (multi-app blast) — yes/no

Regression scope — areas needing re-test

Rule: Complex touch → add risk_factors. Each downstream consumer → +1-3 regression cases. Blast >5 areas OR >2 complex → re-evaluate SPLIT before estimating.

Risk Margin (drives max bound):

likely_days Base margin
<1d trivial +10%
1-2d small additive +20%
3-4d real feature +35%
5-7d large +50%
8-10d very large +75%
>10d +100% AND flag SHOULD SPLIT

Risk-factor add-ons (additive — enumerate in risk_factors):

Factor +margin
touches-complex-existing-feature (>500 LOC, multi-handler, central) +20%
cross-service-contract change +25%
schema-migration-on-populated-data +25%
new-tech-or-unfamiliar-pattern +30%
regression-fan-out (≥3 downstream areas re-test) +20%
performance-or-latency-critical +20%
concurrency-race-event-ordering +25%
shared-common-code (multi-consumer/multi-app) +25%
unclear-requirements-or-design +30%

Collapse rule: total margin >100% → STOP, split (padding past 2x is dishonesty). Margin <15% on likely_days ≥5 → under-estimated, widen.

Work-Type Caps (hard ceilings on likely_days):

Work type Max SP Max likely
Single field / config flag / style fix 1 0.5d
Add property to existing model + bind to existing UI 2 1d
Additive endpoint + minor UI control (button/menu/column), reuses fixtures 3 2-3d
Additive endpoint + NEW UI surface OR additive multi-layer + new domain rule + 2+ test files 5 3-5d
NEW model/aggregate OR migration OR cross-module contract OR heavy test (>1.5d) OR NEW UI + non-trivial backend 8 5-7d
NEW UI surface + (NEW aggregate OR migration OR cross-service contract) 13 SHOULD split
Cross-service contract + migration combined 13 SHOULD split
Beyond 21 MUST split

SP→Days (validation only): 1=0.5d/0.25d · 2=1d/0.35d · 3=2d/0.65d · 5=4d/1.0d · 8=6d/1.5d · 13=10d/2.0d (Trad/AI likely) AI speedup: SP 1≈2x · 2-3≈3x · 5-8≈4x · 13+≈5x. AI cost = (code_gen × 1.3) + (test_gen × 1.3) (30% review overhead).

MANDATORY frontmatter:
story_points: <n>
complexity: low | medium | high | critical
man_days_traditional: '<min>-<max>d' # range when likely ≥3d; '<N>d' when <3d
man_days_ai: '<min>-<max>d'
risk_margin_pct: <n> # base + add-ons
risk_factors: [touches-complex-existing-feature, regression-fan-out] # closed-list from add-ons; [] if none
blast_radius:
    touched_areas: <n>
    complex_touched: <n>
    downstream_consumers: [list or count]
    shared_common_code: yes | no
estimate_scope_included: [code, integration-tests, frontend, i18n, docs]
estimate_scope_excluded: [unit-tests, e2e, perf, deployment, code-review-rounds]
estimate_reasoning: |
    5-7 lines covering:
    (a) UI tier — row applied
    (b) Backend tier — row applied
    (c) Test scope — case breakdown by driver, file count, fixtures, tier row
    (d) Cost driver — dominant tier + why
    (e) Blast radius — touched, complex, regression scope
    (f) Risk factors — list driving margin; why not larger/smaller
    Example: "UI: compose Form/Table/Dialog → NEW screen (~1.5d). Backend: NEW command on existing aggregate,
    reuses validation+repo (~1d). Tests: 4 transitions × 2 actors + 3 validation + 2 UI states = 13 cases,
    1 new fixture → tier 13-25 ~1.5d. Driver: UI composition + new states. Blast: 4 areas, 1 complex.
    Risk: base 35% + touches-complex +20% = 55% → max 3.9d → range 2.5-4d."
Sanity self-check:

likely_days ≥3d and single-point? → reject, must be range

Margin <15% on likely_days ≥5d? → under-estimated, widen

Margin >100%? → STOP, split instead of buffer

Complex existing feature touched, no regression budget in (c)? → reject

Blast >5 areas OR >2 complex, no split discussion? → reject

Purely additive on existing model AND existing UI? → cap SP 3 unless tests >1.5d

NEW UI surface (page/complex form/dashboard)? → SP 5+ even if backend one endpoint

Backend cross-service / migration / multi-aggregate? → SP 8+ regardless of UI

bottom_up_hours / 6 vs SP-Days disagreement >50%? → trust bottom-up, downgrade SP

Without tests, SP drops ≥1 bucket? → tests dominate; state explicitly

Reasoning called out UI vs backend vs blast vs risk factors? → if missing, add

Scaffold Production Readiness — Every scaffolded project MUST ATTENTION include 5 foundations:

Code Quality Tooling — linting, formatting, pre-commit hooks, CI gates. Specific tool choices → docs/project-reference/ or project-config.json.

Error Handling Foundation — HTTP interceptor, error classification (4xx/5xx taxonomy), user notification, global uncaught handler.

Loading State Management — counter-based tracker (not boolean toggle), skip-token for background requests, 300ms flicker guard.

Docker Development Environment — compose profiles (dev/test/infra), multi-stage Dockerfile, health checks on all services, non-root production user.

Integration Points — document each outbound boundary; configure retry + circuit breaker + timeout; integration tests for happy path and failure path.

BLOCK $cook if any foundation is unchecked. Present 2-3 options per concern via a direct user question before implementing.

Cross-Cutting Quality — Check across all changed files:

Error handling consistency — same error patterns across related files

Logging — structured logging with correlation IDs for traceability

Security — no hardcoded secrets, input validation at boundaries, auth checks present

Performance — no N+1 queries, unnecessary allocations, or blocking calls in async paths

Observability — health checks, metrics, tracing spans for new endpoints

UI Wireframe — For UI artifacts: include ASCII wireframe (box-drawing chars), component tree with EXISTING/NEW classification and tier (common | domain-shared | page/app), interaction flow (user action → system response → UI update), states table (default/loading/empty/error), and responsive breakpoint behavior. Process Figma URLs or screenshots BEFORE wireframing. Search existing component libs before proposing new components. Backend-only changes: N/A — Backend-only change. No UI affected.

Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.

Sequential Thinking Protocol — Structured multi-step reasoning for complex/ambiguous work. Use when planning, reviewing, debugging, or refining ideas where one-shot reasoning is unsafe.

Trigger when: complex problem decomposition · adaptive plans needing revision · analysis with course correction · unclear/emerging scope · multi-step solutions · hypothesis-driven debugging · cross-cutting trade-off evaluation.

Format (explicit mode — visible thought trail):

Thought N/M: [aspect] — one aspect per thought, state assumptions/uncertainty

Thought N/M [REVISION of Thought K]: ... — when prior reasoning invalidated; state Original / Why revised / Impact

Thought N/M [BRANCH A from Thought K]: ... — explore alternative; converge with decision rationale

Thought N/M [HYPOTHESIS]: ... then [VERIFICATION]: ... — test before acting

Thought N/N [FINAL] — only when verified, all critical aspects addressed, confidence >80%

Mandatory closers: Confidence % stated · Assumptions listed · Open questions surfaced · Next action concrete.

Stop conditions: confidence <80% on any critical decision → escalate via ask the user directly · ≥3 revisions on same thought → re-frame the problem · branch count >3 → split into sub-task.

Implicit mode: apply methodology internally without visible markers when adding markers would clutter the response (routine work where reasoning aids accuracy).

Deep-dive: see $sequential-thinking skill (.claude/skills/sequential-thinking/SKILL.md) for worked examples (api-design, debug, architecture), advanced techniques (spiral refinement, hypothesis testing, convergence), and meta-strategies (uncertainty handling, revision cascades).

MANDATORY MUST ATTENTION estimation: bottom-up phase hours drive man_days_traditional (Σh/6 × productivity_factor); SP DERIVED. UI cost usually dominates — bump SP one bucket if NEW UI surface (page/complex form/dashboard). Frontmatter MUST include story_points, complexity, man_days_traditional, man_days_ai, estimate_scope_included, estimate_scope_excluded, estimate_reasoning (UI vs backend cost driver). Cap SP 3 for additive-on-existing-model+existing-UI unless test scope >1.5d. SP 13 SHOULD split, SP 21 MUST split.

IMPORTANT MUST ATTENTION read frontend-patterns-reference, scss-styling-guide, design-system/README before any UI change.

IMPORTANT MUST ATTENTION verify 5 production-readiness foundations (code quality, error handling, loading state, Docker, integration points) for scaffold PBIs.

IMPORTANT MUST ATTENTION check error handling, logging, security, performance, and observability across changed files.

MUST ATTENTION apply critical thinking — every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact.

MUST ATTENTION apply sequential-thinking — multi-step Thought N/M, REVISION/BRANCH/HYPOTHESIS markers, confidence % closer; see $sequential-thinking skill.

MUST ATTENTION apply AI mistake prevention — holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction.

Prompt-Enhance Closing Anchors

Closing Reminders

MANDATORY IMPORTANT MUST ATTENTION break work into small tasks via task tracking BEFORE starting
MANDATORY IMPORTANT MUST ATTENTION validate decisions with user via a direct user question — NEVER auto-decide
MANDATORY IMPORTANT MUST ATTENTION add final review task to verify work quality
MANDATORY IMPORTANT MUST ATTENTION add task: run $why-review — validate PBI design rationale before $story or $tdd-spec
MANDATORY IMPORTANT MUST ATTENTION add task: run $pbi-challenge — Dev BA PIC review before $dor-gate or $story

Anti-Rationalization:

Evasion	Rebuttal
"Simple PBI, skip hypothesis validation"	Wrong assumption wastes more time than validation check. Apply always.
"Validation interview is optional here"	NEVER optional — user decides assumptions, AI doesn't
"Skip Dependencies table, no blockers"	Unknown blockers exist. Always fill table — even if `independent`
"Skip story points, just write ACs"	`story_points`, `man_days_traditional`, `man_days_ai` mandatory in frontmatter
"Domain context not needed for small PBI"	Small PBIs touch entities. Read domain-entities-reference first

[TASK-PLANNING] Before acting, analyze task scope and systematically break it into small todo tasks and sub-tasks using task tracking.

Hookless Prompt Protocol Mirror (Auto-Synced)

Source: .claude/hooks/lib/prompt-injections.cjs + .claude/.ck.json

[WORKFLOW-EXECUTION-PROTOCOL] [BLOCKING] Workflow Execution Protocol — MANDATORY IMPORTANT MUST CRITICAL. Do not skip for any reason.

DETECT: Match prompt against workflow catalog
ANALYZE: Find best-match workflow AND evaluate if a custom step combination would fit better
ASK (REQUIRED FORMAT): Use a direct user question with this structure:
- Question: "Which workflow do you want to activate?"
- Option 1: "Activate [BestMatch Workflow] (Recommended)"
- Option 2: "Activate custom workflow: [step1 → step2 → ...]" (include one-line rationale)
ACTIVATE (if confirmed): Call $workflow-start <workflowId> for standard; sequence custom steps manually
CREATE TASKS: task tracking for ALL workflow steps
EXECUTE: Follow each step in sequence [CRITICAL-THINKING-MINDSET] Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination principle: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination. AI Attention principle (Primacy-Recency): Put the 3 most critical rules at both top and bottom of long prompts/protocols so instruction adherence survives long context windows.

Learned Lessons

Lessons Learned

[CRITICAL] Hard-won project debugging/architecture rules. MUST ATTENTION apply BEFORE forming hypothesis or writing code.

Quick Summary

Goal: Prevent recurrence of known failure patterns — debugging, architecture, naming, AI orchestration, environment.

Top Rules (apply always):

MUST ATTENTION verify ALL preconditions (config, env, DB names, DI regs) BEFORE code-layer hypothesis
MUST ATTENTION fix responsible layer — NEVER patch symptom sites with caller-specific defensive code
MUST ATTENTION use ExecuteInjectScopedAsync for parallel async + repo/UoW — NEVER ExecuteUowTask
MUST ATTENTION name by PURPOSE not CONTENT — adding member forces rename = abstraction broken
MUST ATTENTION persist sub-agent findings incrementally after each file — NEVER batch at end
MUST ATTENTION Windows bash: verify Python alias (where python/where py) — NEVER assume python/python3 resolves

Debugging & Root Cause Reasoning

[2026-04-11] Holistic-first: verify environment before code. Failure → list ALL preconditions (config, env vars, DB names, endpoints, DI regs, credentials, permissions, data prerequisites) → verify each via evidence (grep/cat/query) BEFORE code-layer hypothesis. Worst rabbit holes: diving nearest layer while bug sits elsewhere — e.g., hours debugging "sync timeout", real cause: test appsettings pointing wrong DB. Cheapest check first.
[2026-04-01] Ask "whose responsibility?" before fixing. Trace: bug in caller (wrong data) or callee (wrong handling)? Fix responsible layer — NEVER patch symptom site masking real issue.
[2026-04-01] Trace data lifecycle, not error site. Follow data: creation → transformation → consumption. Bug usually where data created wrong, not consumed.
[2026-04-01] Code is caller-agnostic. Functions/handlers/consumers don't know who invokes them. Comments/guards/messages describe business intent — NEVER reference specific callers (tests, seeders, scripts).

Architecture Invariants

[2026-03-31] ParallelAsync + repo/UoW MUST use ExecuteInjectScopedAsync, NEVER ExecuteUowTask. ExecuteUowTask creates new UoW but reuses outer DI scope (same DbContext) — parallel iterations sharing non-thread-safe DbContext silently corrupt data. ExecuteInjectScopedAsync creates new UoW + new DI scope (fresh repo per iteration).
[2026-03-31] Bus message naming MUST include service name prefix — core services NEVER consume feature events. Prefix declares schema ownership (AccountUserEntityEventBusMessage = Accounts owns). Core services (Accounts, Communication) are leaders. Feature services (Growth, Talents) sending to core MUST use {CoreServiceName}...RequestBusMessage — never define own event for core to consume.

Naming & Abstraction

[2026-04-12] Name PURPOSE not CONTENT — "OrXxx" anti-pattern. HrManagerOrHrOrPayrollHrOperationsPolicy names set members, not what it guards. Add role → rename = broken abstraction. Rule: names express DOES/GUARDS, not CONTAINS. Test: adding/removing member forces rename? YES = content-driven = bad → rename to purpose (e.g., HrOperationsAccessPolicy). Nuance: "Or" fine in behavioral idioms (FirstOrDefault, SuccessOrThrow) — expresses HAPPENS, not membership.

Environment & Tooling

[2026-04-20] Windows bash: NEVER assume python/python3 resolves — verify alias first. Python may not be in bash PATH under those names. Check: where python / where py. Prefer py (Windows Python Launcher) for one-liners, node if JS alternative exists.

Test-specific lessons → docs/project-reference/integration-test-reference.md Lessons Learned section. Production-code anti-patterns → docs/project-reference/backend-patterns-reference.md Anti-Patterns section. Generic debugging/refactoring reminders → System Lessons in .claude/hooks/lib/prompt-injections.cjs.

Closing Reminders

IMPORTANT MUST ATTENTION holistic-first: verify ALL preconditions (config, env, DB names, endpoints, DI regs) BEFORE code-layer hypothesis — cheapest check first
IMPORTANT MUST ATTENTION fix responsible layer — NEVER patch symptom site; trace caller (wrong data) vs callee (wrong handling), fix root owner
IMPORTANT MUST ATTENTION parallel async + repo/UoW → ALWAYS ExecuteInjectScopedAsync, NEVER ExecuteUowTask (shared DbContext = silent data corruption)
IMPORTANT MUST ATTENTION bus message prefix = schema ownership; feature services NEVER define events for core services — use {CoreServiceName}...RequestBusMessage
IMPORTANT MUST ATTENTION name by PURPOSE — adding/removing member forces rename = broken abstraction
IMPORTANT MUST ATTENTION sub-agents MUST write findings after each file/section — NEVER batch all findings into one final write
IMPORTANT MUST ATTENTION Windows bash: NEVER assume python/python3 resolves — run where python/where py first, use py launcher or node

[LESSON-LEARNED-REMINDER] [BLOCKING] Task Planning & Continuous Improvement — MANDATORY. Do not skip.

Break work into small tasks (task tracking) before starting. Add final task: "Analyze AI mistakes & lessons learned".

Extract lessons — ROOT CAUSE ONLY, not symptom fixes:

Name the FAILURE MODE (reasoning/assumption failure), not symptom — "assumed API existed without reading source" not "used wrong enum value".
Generality test: does this failure mode apply to ≥3 contexts/codebases? If not, abstract one level up.
Write as a universal rule — strip project-specific names/paths/classes. Useful on any codebase.
Consolidate: multiple mistakes sharing one failure mode → ONE lesson.
Recurrence gate: "Would this recur in future session WITHOUT this reminder?" — No → skip $learn.
Auto-fix gate: "Could $code-review/$code-simplifier/$security/$lint catch this?" — Yes → improve review skill instead.
BOTH gates pass → ask user to run $learn. [TASK-PLANNING] [MANDATORY] BEFORE executing any workflow or skill step, create/update task tracking for all planned steps, then keep it synchronized as each step starts/completes.

UI tier	Cost
Reuse component on existing screen	0.1-0.3d
Add control/column to existing screen	0.3-0.8d
Compose components into NEW screen	1-2d
NEW screen, custom layout/states/validation	2-4d
NEW shared/common component (themed, tested)	3-6d+

Backend tier	Cost
Reuse query/handler from new place	0.1-0.3d
Small update existing handler/entity	0.3-0.8d
NEW query on existing repo/model	0.5-1d
NEW command/handler on existing aggregate (additive)	1-2d
NEW aggregate/entity (repo, validation, events)	2-4d
NEW cross-service contract OR schema migration	2-4d each
Multi-aggregate invariant / heavy domain rule	3-5d

Driver	Count
Happy-path journeys	1 per story / AC main flow
State-machine transitions	reachable transitions × allowed actors
Multi-entity state combos	state(A) × state(B) — REACHABLE only, not Cartesian
Authorization matrix	(owner, non-owner, elevated, unauth) × each mutation
Validation rules	1 per required field / boundary / format / cross-field
UI states (per new screen/dialog)	happy, loading, empty, error, partial — present only
Negative paths / invariants	1 per violatable business rule

Test tier (Trad, incl. setup+assert+flake)	Cost
1-5 cases, fixtures reused	0.3-0.5d
6-12 cases, 1 new fixture	0.5-1d
13-25 cases, multi-entity setup	1-2d
26-50 cases OR new state-machine coverage	2-3d
>50 cases OR full E2E journey	3-5d

likely_days	Base margin
<1d trivial	+10%
1-2d small additive	+20%
3-4d real feature	+35%
5-7d large	+50%
8-10d very large	+75%
>10d	+100% AND flag SHOULD SPLIT

Factor	+margin
`touches-complex-existing-feature` (>500 LOC, multi-handler, central)	+20%
`cross-service-contract` change	+25%
`schema-migration-on-populated-data`	+25%
`new-tech-or-unfamiliar-pattern`	+30%
`regression-fan-out` (≥3 downstream areas re-test)	+20%
`performance-or-latency-critical`	+20%
`concurrency-race-event-ordering`	+25%
`shared-common-code` (multi-consumer/multi-app)	+25%
`unclear-requirements-or-design`	+30%

Work type	Max SP	Max likely
Single field / config flag / style fix	1	0.5d
Add property to existing model + bind to existing UI	2	1d
Additive endpoint + minor UI control (button/menu/column), reuses fixtures	3	2-3d
Additive endpoint + NEW UI surface OR additive multi-layer + new domain rule + 2+ test files	5	3-5d
NEW model/aggregate OR migration OR cross-module contract OR heavy test (>1.5d) OR NEW UI + non-trivial backend	8	5-7d
NEW UI surface + (NEW aggregate OR migration OR cross-service contract)	13	SHOULD split
Cross-service contract + migration combined	13	SHOULD split
Beyond	21	MUST split

refine

Codex Project-Reference Loading (No Hooks)

Quick Summary

Frontend/UI Context (if applicable)

Greenfield Mode

Phase 0: Locate Active Plan (if in workflow)

Phase 1: Idea Intake & Context Loading

Phase 2: Domain Research

Phase 3: Problem Hypothesis Validation

Problem Hypothesis Template

Value Hypothesis Template

Validation Process

Phase 4: Requirements Elicitation (BABOK Core 5)

Phase 5: Acceptance Criteria (BDD Format)

Example Scenarios

Project Test Case Format

Phase 5.5: Testability Assessment

Phase 6: Prioritization & Estimation

Quick RICE Score

MoSCoW Categories

Phase 7: Validation Interview (MANDATORY)

Validation Output Format

Cross-Cutting & Production Readiness

Phase 7.5: Re-evaluate Estimation (MANDATORY — runs after Validation Interview)

Inputs (locked by end of Phase 7)

Re-derive (per SYNC:estimation-framework)

Compare against Phase 6 draft estimate

Output

Phase 8: PBI Artifact Generation

PBI Template

Anti-Patterns to Avoid

Key Rules

Definition of Ready (INVEST)

Project Integration

Related

Next Steps

Prompt-Enhance Closing Anchors

Closing Reminders

Hookless Prompt Protocol Mirror (Auto-Synced)

[WORKFLOW-EXECUTION-PROTOCOL] [BLOCKING] Workflow Execution Protocol — MANDATORY IMPORTANT MUST CRITICAL. Do not skip for any reason.

Learned Lessons

Lessons Learned

Quick Summary

Debugging & Root Cause Reasoning

Architecture Invariants

Naming & Abstraction

Environment & Tooling

Closing Reminders

[LESSON-LEARNED-REMINDER] [BLOCKING] Task Planning & Continuous Improvement — MANDATORY. Do not skip.

Codex Project-Reference Loading (No Hooks)

Quick Summary

Frontend/UI Context (if applicable)

Greenfield Mode

Phase 0: Locate Active Plan (if in workflow)

Phase 1: Idea Intake & Context Loading

Phase 2: Domain Research

Phase 3: Problem Hypothesis Validation

Problem Hypothesis Template

Value Hypothesis Template

Validation Process

Phase 4: Requirements Elicitation (BABOK Core 5)

Phase 5: Acceptance Criteria (BDD Format)

Example Scenarios

Project Test Case Format

Phase 5.5: Testability Assessment

Phase 6: Prioritization & Estimation

Quick RICE Score

MoSCoW Categories

Phase 7: Validation Interview (MANDATORY)

Validation Output Format

Cross-Cutting & Production Readiness

Phase 7.5: Re-evaluate Estimation (MANDATORY — runs after Validation Interview)

Inputs (locked by end of Phase 7)

Re-derive (per SYNC:estimation-framework)

Compare against Phase 6 draft estimate

Output

Phase 8: PBI Artifact Generation

PBI Template

Anti-Patterns to Avoid

Key Rules

Re-derive (per `SYNC:estimation-framework`)

Re-derive (per `SYNC:estimation-framework`)