| name | production-grade |
| description | Orchestrates software engineering work — build apps, add features, fix bugs, refactor code, review PRs, write tests, deploy services, audit security, design architecture, generate docs, optimize performance, debug issues, or explore ideas. Any coding or development request gets routed to the right specialized skills automatically.
|
Production Grade
!git status 2>/dev/null || echo "No git repo detected"
!cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"
!ls .forgewright/ 2>/dev/null || echo "No existing workspace"
!cat .production-grade.yaml 2>/dev/null || echo "No config file — defaults apply"
Overview
Adaptive meta-skill orchestrator for all software engineering work. Analyzes the user's request, identifies which skills are needed, builds a minimal task graph, and executes — from a single code review to a full 17-skill greenfield build.
55 skills, one orchestrator. The orchestrator routes to the right skills based on what the user actually needs. No forced full-pipeline execution for everyday tasks.
All skills are bundled in this plugin. Single install, everything included.
Middleware Chain (v8.0 — DeerFlow Pattern)
Every skill invocation is wrapped by an ordered middleware chain. Implementation details are in skills/production-grade/middleware/:
Pre-Skill: ① SessionData → ② ContextLoader → ③ SkillRegistry → ④ Guardrail → ⑤ Summarization
═══ SKILL EXECUTION ═══
Post-Skill: ⑥ QualityGate → ⑦ BrownfieldSafety → ⑧ TaskTracking → ⑨ Memory → ⑩ GracefulFailure → ⑪ CircuitBreaker → ⑫ Bulkhead → ⑬ Verification
| # | Middleware | File | Hook | Purpose |
|---|
| ① | SessionData | middleware/01-session-data.md | before_skill | Load profile, session state |
| ② | ContextLoader | middleware/02-context-loader.md | before_skill | Load memory, conventions |
| ③b | DryRunContext | skills/_shared/protocols/dryrun-interceptor.md | before_skill | Dry-run mode system prompt injection |
| ③ | SkillRegistry | middleware/03-skill-registry.md | before_skill | Progressive skill loading |
| ④ | Guardrail | middleware/04-guardrail.md | before_tool | Pre-tool authorization |
| ⑤ | Summarization | middleware/05-summarization.md | before_skill | Context compression |
| ⑥ | QualityGate | middleware/06-quality-gate.md | after_skill | Post-skill validation |
| ⑦ | BrownfieldSafety | middleware/07-brownfield-safety.md | after_skill | Regression + protected paths |
| ⑧ | TaskTracking | middleware/08-task-tracking.md | after_skill | Update todos, emit events |
| ⑨ | Memory | middleware/09-memory.md | after_skill + turn_close | Persistent fact extraction |
| ⑩ | GracefulFailure | middleware/10-graceful-failure.md | on_error | Retry logic, stuck detection |
| ⑪ | CircuitBreaker | skills/_shared/protocols/circuit-breaker.md | after_skill | Fault isolation + state machine |
| ⑫ | Bulkhead | skills/_shared/protocols/bulkhead.md | after_skill | Resource limits per worker type |
| ⑬ | Verification | skills/_shared/protocols/verification.md | after_skill | Contract + criteria check |
Middleware protocol: skills/_shared/protocols/middleware-chain.md | | | | |
Progressive Skill Loading (v8.0 — DeerFlow Pattern)
Skills are loaded on-demand based on classified mode. Read .forgewright/skills-config.json for the mode→skill mapping.
Instead of loading all 52 skill descriptions (~66KB), only load skills relevant to the mode:
Review mode → loads 1 skill (~3KB)
Feature mode → loads 5 skills (~15KB)
Full Build → loads 10 skills (~30KB)
Fallback → load all skills (classification failure)
When to Use
- Building a new SaaS, platform, or service from scratch (full pipeline)
- Adding a feature to an existing codebase
- Hardening code before launch (security + QA + review)
- Setting up CI/CD, Docker, Terraform for existing code
- Writing tests for existing code
- Reviewing code quality or architecture conformance
- Designing architecture or API contracts
- Writing documentation for existing systems
- Performance optimization or reliability engineering
- Any task that benefits from structured, production-quality execution
- User says "build me a...", "add [feature]", "review my code", "set up CI/CD", "write tests", "harden this", "document this"
Request Classification
Before any execution, classify the user's request into a mode. This determines which skills run and how.
Paperclip Detection (Optional)
Before classifying, check if this session is managed by Paperclip:
Paperclip indicators: ticket reference (#42, CLIP-, [paperclip]),
heartbeat context, budget mention, agent identity
If detected:
- Read
skills/_shared/protocols/paperclip-integration.md
- Switch to Express engagement mode (fully autonomous)
- Apply ticket scope discipline (stay within assigned task)
- Use structured output format for Paperclip consumption
- Apply cost-awareness rules
If not detected → proceed normally (no changes).
Step 0 — Request Interpretation (MANDATORY)
⚠️ DO NOT SKIP THIS STEP. EVER.
Before ANY skill execution, interpret the user's request:
-
Extract 9 dimensions (from chat-interpreter):
- Task: What they actually want
- Target tool: Forgewright mode
- Output format: What they expect
- Constraints: Explicit limits
- Input: What they're providing
- Context: Prior decisions, project state
- Audience: Who uses output
- Success criteria: How they know it's done
- Examples: Reference systems
-
Scan for vague patterns (from credit-killing patterns):
- Vague verb ("help me", "make it", "do something") → ask specifics
- Two tasks in one → ask priority
- No success criteria → derive and confirm
- Emotional description → extract technical fault
- Assumed knowledge → inject context
- No project context → pull from project-profile.json
- No scope boundary → ask what's in/out
- No file path → ask for location
IntentGate — Explicit Intent Analysis (NEW Step 0.2)
Purpose: Before classifying into modes, verify we understand the user's TRUE goal. This prevents literal misinterpretation — user says "fix the login" but actually wants OAuth added.
Trigger: Runs AFTER vague pattern scanning, BEFORE clarification questions.
Three reflection questions — answer them YOURSELF as the agent:
INTENTGATE ANALYSIS:
After scanning for vague patterns, ask yourself:
1. "What is the USER'S GOAL behind this request?" (not the literal action)
2. "What does success look like to the USER?" (what would they consider done?)
3. "What would the USER consider a complete fix/implementation?"
If the literal interpretation differs from the Intent Analysis:
→ Highlight the discrepancy in the structured request
→ If HIGH confidence: proceed with Intent, note mode reclassification
→ If MEDIUM/LOW confidence: ask 1 clarifying question to confirm intent
Rules:
- IntentGate is 3 reflection questions MAX — answer them yourself, do NOT ask the user
- Only ask the user if the intent is genuinely ambiguous (MEDIUM/LOW confidence)
- IntentGate adds 0 token overhead if confidence is HIGH — it's internal reflection
- If mode reclassified based on Intent Analysis, note it explicitly
Output: Append Intent Analysis to the structured request below.
-
Clarification Rules:
- MAX 3 clarifying questions — pick the 3 most critical
- If HIGH confidence: Skip clarification, generate structured request
- If MEDIUM/LOW confidence: Ask before proceeding
- NEVER start executing if request is unclear
- Use defaults for everything else (don't over-ask)
-
Generate Structured Request:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 INTERPRETED REQUEST
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Mode: [detected]
Confidence: [HIGH/MEDIUM/LOW]
Intent: "[original message quoted]"
What you want:
[1-sentence clear description]
Intent Analysis (Step 0.2):
- User's true goal: [1-sentence — what they actually want, not what they said]
- Success definition: [from the USER's perspective]
- Intent vs Literal: [if different from what they said, note it here]
✗ Literal: [what they literally said]
✓ Intent: [what they actually need]
Key decisions made:
[Defaults applied with reasoning]
Scope:
✓ [In scope]
✗ [Out of scope]
Success criteria:
[How we know it's done]
Missing (will be handled by PM):
[Max 3 items]
Plan Quality & Self-Improvement Loop (MANDATORY Step 2):
- Initial Plan Score: [Score/10]
- Optimization Iterations: [N times (0 if score >= 9.0 on first try)]
- Research Gate Triggered: [Yes/No (and what was researched if Yes)]
- Final Plan Score: [Score/10 - Must be >= 9.0]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1 — Analyze the request:
Read .forgewright/subagent-context/INTERPRETED_REQUEST.md (from chat-interpreter Step -1) for the authoritative request analysis. The chat-interpreter has already performed 9-dimension extraction and mode detection.
Enhanced Mode Classification with Fuzzy Matching (v8.7+)
Confidence Scoring System
Every mode classification returns a confidence score (0.0 - 1.0):
┌─ Mode Classification ─────────────────────────────────────┐
│ │
│ Detected: Feature │
│ Confidence: 0.87 │
│ │
│ Evidence: │
│ • "add login" → feature keyword match │
│ • "implement" → strong signal │
│ • No full-stack indicators → no Full Build │
│ │
│ Secondary candidates: │
│ • Full Build (0.23) — mentions "system" │
│ • AI Build (0.15) — mentions "smart" │
│ │
│ Status: ✅ Proceeding with Feature mode │
└────────────────────────────────────────────────────────────┘
Trigger Matching
| Match Type | Confidence | Example |
|---|
| Exact match | 0.95-1.0 | "build a SaaS" → Full Build |
| Fuzzy match | 0.7-0.94 | "make a web app" → Full Build (0.85) |
| Weak signal | 0.4-0.69 | "help me" → Explore (0.45) |
| No match | < 0.4 | Fallback chain invoked |
Fuzzy Trigger Patterns
classification:
primary:
trigger: "build a SaaS"
mode: "Full Build"
confidence: 0.95
keywords: ["build", "saas", "full stack", "from scratch", "greenfield"]
fuzzy:
- trigger: "build"
mode: "Full Build"
threshold: 0.7
synonyms: ["create", "make", "develop", "construct"]
- trigger: "game"
mode: "Game Build"
threshold: 0.75
engine_keywords: ["unity", "unreal", "godot", "roblox", "phaser", "threejs"]
- trigger: "mobile"
mode: "Mobile"
threshold: 0.7
keywords: ["ios", "android", "react native", "flutter"]
fallback:
- mode: "Explore"
confidence: 0.3
reason: "Ambiguous request"
- mode: "Feature"
confidence: 0.4
reason: "Default for additions"
- mode: "Full Build"
confidence: 0.35
reason: "Catch-all for builds"
Fuzzy Matching Rules
- Keyword extraction — Extract key terms from request
- Stemming — Match "building" to "build"
- Synonym expansion — Match "create" to "build"
- Partial matching — "unity" matches "Unity3D"
- Context weighting — "game" near "mobile" → Game Build (higher)
Fallback Chain
When no match exceeds the threshold:
Fallback sequence:
1. Polymath (Explore) — Help clarify intent
2. Feature — Default for additions
3. Full Build — Catch-all for builds
4. Custom — Let user pick mode
Configuration
skillRouting:
fuzzyMatching:
enabled: true
minConfidence: 0.7
synonymExpansion: true
stemmingEnabled: true
fallbackChain:
- Explore
- Feature
- Full Build
If confidence: HIGH → use the detected mode directly, skip the classification table.
If confidence: MEDIUM → present 2 most likely modes to the user.
If confidence: LOW → present 3 most likely modes to the user.
⚠️ ENFORCEMENT: If request is unclear, STOP and ask. DO NOT start executing.
The following requests MUST trigger clarification:
- Contains vague verbs: "help me", "make it", "do something", "fix it"
- No specific scope: "build an app", "add a feature", "update the system"
- Two or more tasks in one: "explain AND build", "fix AND test"
- No success criteria: "make it better", "improve it"
- No file/location specified: "update login", "add auth"
Override the detected mode only if the user's intent clearly differs from what was interpreted. Otherwise, trust the chat-interpreter's analysis.
| Mode | Trigger Signals | Skills Involved |
|---|
| Full Build | "build a SaaS", "production grade", "from scratch", "full stack", greenfield intent | All skills, full DEFINE→BUILD→HARDEN→SHIP→SUSTAIN→GROW pipeline |
| Feature | "add [feature]", "implement [feature]", "new endpoint", "new page", "integrate [service]" | BA (if gaps detected) → PM (scoped) → Architect (scoped) → BE/FE → QA |
| Harden | "review", "audit", "secure", "harden", "before launch", "production ready" (on EXISTING code) | Security + QA + Code Review (sequential) → Remediation |
| Ship | "deploy", "CI/CD", "containerize", "infrastructure", "terraform", "docker" | DevOps → SRE |
| Debug | "debug", "fix bug", "broken", "investigate", "not working", "error", "trace", "crashes" | Debugger (→ Software/Frontend Engineer for fix) |
| AI Build | "AI feature", "chatbot", "RAG", "embeddings", "LLM", "agent", "prompt", "AI-powered" | AI Engineer + Prompt Engineer + Data Scientist + Architect (scoped) → BE/FE |
| Migrate | "migrate", "upgrade", "migration", "database change", "schema change", "refactor DB", "move to" | Database Engineer + Software Engineer → QA |
| Test | "write tests", "test coverage", "test this", "add tests" | QA |
| Review | "review my code", "code review", "code quality", "check my code" | Code Reviewer |
| Architect | "design", "architecture", "API design", "data model", "tech stack", "how should I structure" | Solution Architect |
| Document | "document", "write docs", "API docs", "README" | Technical Writer |
| Explore | "explain", "understand", "help me think", "what should I", "I'm not sure" | Polymath |
| Research | "research", "deep research", "find sources", "analyze topic", "investigate [domain]", "NotebookLM", "study materials", "generate quiz" | NotebookLM Researcher → Polymath (research mode) + NotebookLM MCP (primary) |
| Optimize | "performance", "slow", "optimize", "scale", "reliability" | Performance Engineer + SRE + Code Reviewer |
| Design | "design UI", "wireframes", "design system", "color palette", "UX flow" | UX Researcher → UI Designer |
| Mobile | "mobile app", "React Native", "Flutter", "iOS", "Android" | BA (if gaps detected) → Mobile Engineer (+ PM scoped, Architect scoped if needed) |
| Game Build | "game", "Unity", "Unreal", "Godot", "Roblox", "Phaser", "Three.js", "gameplay", "game design", "build a game" | Game Designer → Engine Engineer (Unity/Unreal/Godot/Phaser 3/Three.js) → Level/Narrative/TechArt/Audio |
| XR Build | "VR", "AR", "MR", "XR", "spatial", "Quest", "Vision Pro", "WebXR" | XR Engineer (+ Game Build pipeline if game-like XR) |
| Marketing | "marketing", "SEO", "launch strategy", "copywriting", "content strategy", "go-to-market" | Growth Marketer (+ Conversion Optimizer if CRO mentioned) |
| Grow | "growth", "CRO", "conversion", "funnel", "A/B test", "churn", "retention", "referral" | Conversion Optimizer (+ Growth Marketer if strategy needed) |
| Analyze | "analyze requirements", "evaluate this", "is this feasible", "validate requirements", "check completeness", "client says" | Business Analyst (standalone requirements analysis) |
| Goal | "set goal", "work toward", "keep going until", "autonomous", "/goal" | Goal-Driven orchestrator — auto-evaluate and continue until condition met |
| Custom | Doesn't fit above patterns | Present skill menu, let user pick |
Step 2 — Present or skip the plan:
Single-skill modes (Test, Review, Architect, Document, Explore, Design, Debug, Analyze, Goal): Skip plan presentation. Classify → invoke immediately. The intent is obvious — no overhead needed.
Goal mode is special — it works with ANY skill. After each turn, it auto-evaluates and continues until the condition is met.
Multi-skill modes (Feature, Harden, Ship, Optimize, AI Build, Migrate, Custom): Present the plan for confirmation via notify_user:
Here's my plan:
[numbered list of skills and what each does]
Scope: [light / moderate / heavy]
1. **Looks good — start (Recommended)** — Execute this plan
2. **I want the full production-grade pipeline** — Run all 55 skills, 6 phases, 3 gates
3. **Adjust the plan** — Add or remove skills from the plan
4. **Chat about this** — Free-form input
Large Feature Mode (Feature with 3+ components, or any request with complexity): Create planning document on antigravity BEFORE starting:
antigravity/
└── planning/
└── [feature-name]/
├── PLAN.md # Main planning document
├── SCOPE.md # Scope definition
├── ARCHITECTURE.md # Technical architecture (if needed)
└── TASKS.md # Task breakdown
Full Build mode: Always proceed to the Full Build Pipeline section below.
If the user selects "full pipeline" from any mode, switch to Full Build.
Step 3 — Execute the mode:
For non-Full-Build modes, use the lightweight execution flows below. For Full Build, use the Full Build Pipeline.
Coding-Level Adaptation
Read codingLevel from .production-grade.yaml (default: 8). Adapt ALL skill output accordingly:
codingLevel: 8
| Level | Style | Output Behavior |
|---|
| 1-3 (Junior) | Guided | Detailed explanations for every decision. Inline comments on complex logic. Link to relevant docs/tutorials. Explain WHY, not just WHAT. Step-by-step instructions for manual steps. |
| 4-7 (Mid) | Standard | Balanced output — explain non-obvious decisions, skip the obvious. Standard inline comments. Focus on trade-offs and alternatives. |
| 8-10 (Senior) | Terse | Code-focused, minimal commentary. Only flag unexpected decisions or gotchas. Diff-style output preferred. No tutorials, no hand-holding. Assume deep familiarity with tools and patterns. |
Rules:
- If
codingLevel is not set, default to Standard (5)
- Coding level affects output verbosity, NOT code quality — all levels produce production-grade code
- Engagement Mode (Express/Standard/Thorough/Meticulous) controls interaction depth — coding level controls explanation depth. They are independent dimensions.
Sensitive File Protection
All skills MUST follow the sensitive file protection protocol:
!cat skills/_shared/protocols/sensitive-file-protection.md 2>/dev/null || echo "Protocol not found — apply defaults: never read .env without user approval, redact secrets in output, check .gitignore before commit"
Plan Quality Loop
ALL skills MUST run the plan quality loop before doing any work. No exceptions — every skill plans first, scores, improves until ≥ 9.0:
!cat skills/_shared/protocols/plan-quality-loop.md 2>/dev/null || echo "Protocol not found — apply defaults: every skill must plan first, score against 8 criteria, threshold 9.0/10, improve loop with research + skill self-improvement"
⚠️ ASIP Enforcement for Plan Quality
After 2 consecutive failed plan attempts (score < 9.0):
- TRIGGER MANDATORY RESEARCH GATE — Cannot skip
- Record attempt:
bash scripts/forgewright-session-tracker.sh plan <score>
- Check if gate needed:
bash scripts/forgewright-session-tracker.sh check
- Research Priority Order:
a) CHECK NotebookLM availability:
nlm --version 2>/dev/null || echo "NOT_AVAILABLE"
└─ If NOT_AVAILABLE → SKIP to (b)
b) TRY NotebookLM CLI (if available):
nlm notebook create "[Project] - [Skill] - [Topic]"
nlm research start "[topic]" --mode deep
nlm notebook query <id> "Best practices?"
c) FALLBACK to Web Search (always available):
WebSearch: "best practices [topic]"
WebSearch: "[framework] [pattern] implementation"
- SYNTHESIZE findings into 1-3 actionable insights
- Update skill SKILL.md (Planning Improvements section)
- Append to
.forgewright/plan-lessons.md
- RE-PLAN with injected knowledge
- Re-score — only proceed if ≥ 9.0
⚠️ BA Scope Exception:
- If weak criteria reveals unclear project requirements, STOP research and trigger BA skill
- BA will ask clarifying questions → define scope → resume Plan Quality Loop
- This is NOT blocking — scope elicitation IS the Forgewright workflow
This is NON-NEGOTIABLE. The system will not proceed until research is complete.
Execution Blocker Loop
DEPRECATED — Use ASIP (Adaptive Self-Improving Loop) instead.
The canonical execution blocker loop is now in self-improving-loop.md (ASIP Phase 2).
This section is kept for reference only.
ANY time a blocker is encountered during implementation, MUST run this loop BEFORE asking user:
!cat skills/_shared/protocols/execution-blocker-loop.md 2>/dev/null || echo "Protocol not found — apply defaults: assess → research (web/codebase/docs) → synthesize → attempt → verify → improve skill. Max 3 cycles."
See ASIP (Adaptive Self-Improving Loop) below for the canonical execution blocker loop.
Adaptive Self-Improving Loop (ASIP)
Combined Plan Quality + Execution Blocker Loop with mandatory NotebookLM research:
!cat skills/_shared/protocols/self-improving-loop.md 2>/dev/null || echo "Protocol not found — apply defaults: 2 failures → research via NotebookLM → update skill → retry"
Core principle: Every failure is a learning opportunity. Skills improve over time based on real failures.
ASIP Metrics
Track project adaptation:
{
"totalResearchGates": 0,
"totalSkillUpdates": 0,
"uniquePatterns": 0,
"lessonsLearned": 0,
"failuresAvoided": 0
}
Review Intensity Mode
Control how much design/architecture review happens at each step:
!cat skills/_shared/protocols/review-intensity.md 2>/dev/null || echo "Protocol not found — apply defaults: Review mode defaults to Lean (reviews only at phase gates). Set in production/review-mode.txt. Modes: full (all reviews), lean (gate reviews only), solo (no reviews)."
User can override per-invocation with --review [mode] flag.
Model Tier Assignment
Assign optimal Claude model tier to each skill invocation:
!cat skills/_shared/protocols/model-tier.md 2>/dev/null || echo "Protocol not found — apply defaults: Sonnet for most skills. Haiku for /sprint-status, /help, /scope-check, /onboard. Opus for /architecture-review, /gate-check, /code-review."
Override per-invocation with --model [haiku|sonnet|opus] flag.
Mode Execution (Non-Full-Build)
All modes share these behaviors:
- Bootstrap workspace:
mkdir -p skills/_shared/protocols/ .forgewright/
- Write shared protocols (same as Full Build step 3)
- Read
.production-grade.yaml for path overrides
- Read existing workspace state if present
- Apply coding-level adaptation from
.production-grade.yaml (see above)
- Apply sensitive file protection protocol for all file operations
- Run plan quality loop on EVERY skill invocation — plan first, score ≥ 9.0 before any work begins
- Asynchronous Heartbeat: Periodically emit human-readable status updates (e.g., "Running tests...", "Applying self-healing fix 2/5...") so the user knows the AI is working and hasn't frozen.
- ⚠️ QA AUTO-RUN (MANDATORY): After any code change (build, fix, feature), ALWAYS run QA/Testing WITHOUT waiting for user prompt. The sequence is: BUILD → TEST → VERIFY → DONE. Never finish without testing.
- Antigravity Planning (for large features): Features with 3+ components MUST use antigravity planning structure BEFORE starting implementation. Create
antigravity/planning/[feature-name]/ with PLAN.md, SCOPE.md, ARCHITECTURE.md, TASKS.md files.
- Engagement mode: ask ONLY if mode involves 3+ skills. For 1-2 skill modes, use Standard engagement + Sequential execution.
Goal Mode Execution (v8.2)
When Goal mode is triggered, Forgewright enters autonomous pursuit mode:
1. SET GOAL:
- Parse condition from user message
- Validate condition is measurable
- Create .forgewright/active-goal.json
2. AUTONOMOUS LOOP:
After each turn:
a. Run evaluation:
bash scripts/goal-evaluate.py "[condition]"
b. Check result:
- MET: Report completion, clear goal, exit autonomous mode
- NOT_MET: Continue to next turn (no user prompt needed)
- UNKNOWN: Ask user to verify
3. PROGRESS TRACKING:
- Write progress to .forgewright/goal-progress.md
- Update turns counter in active-goal.json
- Emit heartbeat: "Working toward goal: [reason why not met yet]"
4. EXIT CONDITIONS:
- Condition is met (evaluator returns MET)
- User runs `/goal clear`
- Safety limit reached (max_turns, timeout)
- User explicitly stops
Integration with other skills: Goal mode wraps ANY skill execution. The underlying skill does the work; Goal mode handles the loop and evaluation.
⚠️ Self-Check Before Finishing (MANDATORY)
BEFORE declaring a task complete, verify ALL of the following:
| # | Check | Action if Failed |
|---|
| 1 | Request interpreted? | If Step 0 wasn't completed, go back and do it |
| 2 | Plan scored ≥ 9.0? | If < 9.0, improve plan before proceeding |
| 3 | ASIP Research Gate followed? | If 2 failures occurred → research + skill update was mandatory |
| 4 | Lessons written? | Append to skill SKILL.md + .forgewright/lessons.md |
| 5 | Code changes made? | If yes → run QA tests |
| 6 | Tests written? | If code changed → write tests |
| 7 | Tests passed? | If tests exist → run them |
| 8 | forgenexus_impact run? | If editing symbols → run impact analysis |
| 9 | Scope respected? | If scope creep detected → flag to user |
| 10 | User approval obtained? | If gate exists → wait for approval |
| 11 | Review mode respected? | If Full mode → run director reviews; if Solo → confirm skip OK |
| 12 | ASIP metrics updated? | Increment counters in .forgewright/asip-metrics.json |
⚠️ NEVER finish a task without completing checks 3-5 if code was changed.
QA Test Sequence (MANDATORY after any code change)
Code Changed?
↓ YES
Run QA Engineer (Express mode)
↓
Write tests (unit → integration → e2e)
↓
Run tests and verify ALL pass
↓
Report results
↓
Done ✓
Do NOT wait for user to ask for tests. Run them automatically.
Antigravity Planning System
For large features (3+ components), use the Antigravity Planning System to structure your work.
When to Use Antigravity
| Feature Type | Antigravity? |
|---|
| Single file change | ❌ No |
| Small (1-2 components) | ❌ No |
| Medium (3+ components) | ✅ Yes |
| Full Build / Game Build | ✅ Required |
| Multi-team coordination | ✅ Required |
| New integration (auth, payment) | ✅ Yes |
Antigravity Folder Structure
antigravity/
└── planning/
└── [feature-name]/
├── PLAN.md # Main planning document
├── SCOPE.md # Scope definition
├── ARCHITECTURE.md # Technical architecture
├── TASKS.md # Task breakdown
├── DECISIONS.md # Architecture decisions log
└── RETROSPECTIVE.md # Post-completion retrospective
Quick Commands
./scripts/antigravity/antigravity.sh new <feature-name>
./scripts/antigravity/antigravity.sh status
./scripts/antigravity/antigravity.sh progress <feature>
./scripts/antigravity/antigravity.sh archive <feature>
Feature Plan Template
Each feature plan must include:
| File | Required? | Content |
|---|
PLAN.md | ✅ Yes | Overview, goals, key decisions, timeline |
SCOPE.md | ✅ Yes | In/out scope, constraints, risks, acceptance criteria |
ARCHITECTURE.md | ⚠️ If complex | Component diagram, data models, API design |
TASKS.md | ✅ Yes | Task breakdown by priority, estimates |
DECISIONS.md | ⚠️ Recommended | Architecture Decision Records |
RETROSPECTIVE.md | ⚠️ After completion | Lessons learned, metrics |
Plan Quality Criteria
Each feature plan must score ≥ 9.0/10 on:
| Criteria | Description |
|---|
| Clarity | Scope clearly defined |
| Completeness | Enough info to implement |
| Feasibility | Achievable in timeframe |
| Risk Awareness | Risks identified |
| Testability | Clear acceptance criteria |
| Maintainability | Long-term viable |
| Priority | Impact vs effort clear |
| Dependencies | External deps identified |
See antigravity/README.md for full documentation.
Feature Mode
Add a feature to an existing codebase. Lightweight DEFINE → BUILD → TEST.
- Codebase scan — read existing code structure, framework, patterns
- BA pre-flight (conditional) — Assess the user's feature description for information gaps using 6W1H. If requirements score < 6/7 completeness → run BA (Express depth) to elicit missing info. If clear → skip. Log:
✓ Requirements complete — skipping BA or ⧖ Information gaps detected — running BA elicitation
- PM (Express depth) — 2-3 questions to scope the feature. Write a mini-BRD (user stories + acceptance criteria for this feature only). If BA ran, use
ba-package.md to reduce questions.
- Architect (scoped) — design how this feature fits the existing architecture. New endpoints, schema changes, component additions. NOT a full system redesign.
- Build — Software Engineer and/or Frontend Engineer implement the feature
- ⚠️ Test (AUTO-RUN) — Immediately write and run tests for the new feature. DO NOT WAIT for user to ask. Sequence: Build → Test → Verify → Done.
- Optional: Review — Code Reviewer checks the new code against existing patterns
1 gate: After PM scoping (step 3), confirm scope before building.
⚠️ IMPORTANT: Step 6 (Test) is MANDATORY. After building, ALWAYS run tests without waiting for user prompt.
Harden Mode
Security + quality audit on existing code. No building, pure analysis + fixes.
- Codebase scan — read all existing code
- Sequential: Security Engineer → QA Engineer → Code Reviewer analyze the code
- Consolidated findings — merge all findings, deduplicate, sort by severity
- Present findings — show Critical/High/Medium/Low counts with top issues
- Remediation — fix Critical and High issues (with user confirmation)
1 gate: After findings (step 4), before remediation.
Ship Mode
Get existing code deployed. Infrastructure + reliability.
- Codebase scan — read existing code, identify services, dependencies
- DevOps — Dockerfiles, CI/CD pipelines, IaC (Terraform/Pulumi), monitoring
- SRE — SLO definitions, runbooks, alerting, chaos experiment plan
1 gate: After DevOps infra plan, before applying.
Test Mode
Write tests for existing code. Single skill.
- Read skills/qa-engineer/SKILL.md and follow its instructions against existing code
- QA reads code, writes test plan, implements tests, runs them
- Report results
0 gates. QA operates autonomously.
Review Mode
Code quality review. Single skill, read-only.
- Read skills/code-reviewer/SKILL.md and follow its instructions
- Review produces findings report
- Present findings with severity distribution
0 gates. Read-only operation.
Architect Mode
Design or redesign architecture. Single skill.
- Read skills/solution-architect/SKILL.md and follow its instructions
- Full discovery interview (depth based on engagement mode)
- Produces ADRs, diagrams, tech stack, API contracts, scaffold
1 gate: Architecture approval before scaffold generation.
Document Mode
Generate documentation for existing code. Single skill.
- Read skills/technical-writer/SKILL.md and follow its instructions
- Reads all code + existing docs
- Generates API reference, dev guides, architecture overview
0 gates. Technical Writer operates autonomously.
Explore Mode
Thinking partner. Single skill.
- Read skills/polymath/SKILL.md and follow its instructions
- Research, advise, ideate — whatever the user needs
- When ready, offer to hand off to any other mode
0 gates. Polymath manages its own dialogue.
Research Mode
Deep, grounded research on any topic. NotebookLM Researcher is the primary skill (v0.5.19, 35+ tools: research, studio, audio, quiz, flashcards, slides, cross-notebook, batch, pipelines, tags). Polymath + crawl4ai are enhancement layers.
- Read
skills/notebooklm-researcher/SKILL.md and follow its instructions
- Check authentication:
nlm auth status
- Check for existing notebooks before creating new:
nlm notebook list
- Phase 1 — Discovery: Identify if this is a new topic (→ create notebook) or existing notebook (→ add sources)
- Phase 2 — Source Ingestion: Add source URLs, text notes, or YouTube videos. Use
nlm research start --mode deep for automatic web discovery
- Phase 3 — NotebookLM Synthesis: Use
notebook describe, notebook query, cross query to synthesize findings
- Phase 4 — Content Generation: Generate study materials: audio (podcast), report (briefing doc/study guide), quiz, flashcards, slides, infographic
- Phase 5 — Cross-Notebook (if needed): Query across multiple notebooks for comparative research
- Phase 6 — Handoff: Format findings as research report with citations, hand off to relevant mode
NotebookLM Capabilities (v0.5.19):
- 35+ MCP tools: notebook, source, research, studio, audio, video, report, quiz, flashcards, mindmap, slides, infographic, data-table, download, export, chat, share, batch, cross, pipeline, tag, alias, config, doctor, skill, setup
- Batch operations: same action across multiple notebooks
- Pipelines:
ingest-and-podcast, research-and-report, multi-format
- Drive sync: stale source detection and sync
- Multi-profile: multiple Google accounts
- Enterprise/Workspace support via
NOTEBOOKLM_BASE_URL
0 gates. NotebookLM Researcher manages dialogue.
Optimize Mode
Performance + reliability analysis. Two skills.
- Code Reviewer — identify performance anti-patterns, N+1 queries, memory leaks
- SRE — capacity analysis, scaling bottlenecks, SLO evaluation
- Consolidated report — performance findings + reliability recommendations
- Remediation — fix top issues
1 gate: After analysis, before fixes.
Marketing Mode
Go-to-market strategy, content, and SEO. Primarily Growth Marketer.
- Growth Marketer — market analysis, positioning, content strategy, SEO audit, copywriting, launch campaign, analytics setup
- Conversion Optimizer (if CRO explicitly mentioned) — funnel audit, CRO recommendations alongside marketing strategy
- Frontend Engineer (if SEO code changes needed) — implement meta tags, schema markup, page speed fixes
1 gate: After strategy, before content creation.
Grow Mode
Conversion optimization, experimentation, and growth engineering. Primarily Conversion Optimizer.
- Conversion Optimizer — funnel audit, CRO implementation, A/B test design, growth loops, churn prevention
- Growth Marketer (if strategy context needed) — provide positioning, messaging, and traffic analysis
- Frontend Engineer (if code changes needed) — implement CRO changes, experiment infrastructure
- QA Engineer (if A/B test infrastructure) — verify experiment implementation
1 gate: After audit, before implementation.
Analyze Mode
Standalone requirements analysis and validation. Single skill.
- Read
skills/business-analyst/SKILL.md and follow its instructions
- BA receives client information, applies 6W1H framework, evaluates completeness
- BA challenges assumptions, checks feasibility, detects contradictions
- BA generates
ba-package.md with validated requirements
- When complete, offer handoff options:
Analysis complete. What next?
1. **Hand off to PM — write BRD from this analysis (Recommended)**
2. **Start Feature mode — build what was analyzed**
3. **Start Full Build — full pipeline from this analysis**
4. **Done — I just needed the analysis**
5. **Chat about this** — Free-form input
0 gates. BA operates autonomously. Handoff is optional.
Custom Mode
User picks skills from a menu. Present via notify_user:
Which skills do you need? (list the numbers separated by commas)
--- Core Engineering ---
1. **Business Analyst** — Requirements elicitation, feasibility analysis, critical evaluation, information gatekeeping
2. **Product Manager** — Requirements, user stories, BRD
3. **Solution Architect** — System design, API contracts, tech stack
4. **Software Engineer** — Backend implementation
5. **Frontend Engineer** — UI components, pages, design system
6. **QA Engineer** — Tests — unit, integration, e2e, performance
7. **Security Engineer** — OWASP audit, STRIDE, AI security, runtime detection
8. **Code Reviewer** — Architecture conformance, code quality, git workflow
9. **DevOps** — Docker, CI/CD, Terraform, monitoring
10. **SRE** — SLOs, chaos engineering, runbooks
11. **Technical Writer** — API docs, dev guides, architecture docs
12. **Data Scientist** — AI/ML systems, RAG pipelines, agent orchestration
13. **Debugger** — Bug investigation, root cause analysis, regression testing
14. **Prompt Engineer** — Prompt design, evaluation, optimization
15. **API Designer** — REST/GraphQL design, endpoints, error taxonomy
16. **Database Engineer** — Schema design, migrations, query optimization
17. **AI Engineer** — MLOps, model serving, fine-tuning, evaluation
18. **Accessibility Engineer** — WCAG compliance, a11y audit, screen reader
19. **Performance Engineer** — Load testing, profiling, Core Web Vitals
20. **UX Researcher** — User research, usability testing, personas
21. **Data Engineer** — ETL pipelines, data warehouse, dbt, data quality
22. **Project Manager** — Sprint planning, velocity, risk management
23. **XLSX Engineer** — Excel spreadsheet creation, financial models, formula-driven reports, data formatting
--- Game Development ---
24. **Game Designer** — GDD, gameplay loops, economy, mechanic specs
25. **Unity Engineer** — C# ScriptableObjects, Editor tools, URP
26. **Unreal Engineer** — C++/Blueprint, GAS, Nanite/Lumen
27. **Godot Engineer** — GDScript, scene tree, signals, cross-platform
28. **Godot Multiplayer** — MultiplayerSpawner, ENet, prediction, dedicated server
29. **Roblox Engineer** — Luau, DataStore, Roblox Studio, experience design
30. **Phaser 3 Engineer** — TypeScript, modular scenes, ECS-optional, WebGL/Canvas, shared vfx/ui helpers
31. **Three.js Engineer** — ECS, WebGPU/WebGL, Rapier physics, performance budgets, post-processing
30. **Level Designer** — Spatial design, encounters, pacing, environmental storytelling
31. **Narrative Designer** — Branching dialogue, character voice, lore
34. **Technical Artist** — Shaders, VFX, LOD, performance budgets
35. **Game Audio Engineer** — Spatial audio, adaptive music, SFX, mix
36. **Unity Shader Artist** — Shader Graph, HLSL, VFX Graph, post-processing
37. **Unity Multiplayer** — Netcode for GameObjects, relay, prediction
38. **Unreal Technical Artist** — Niagara, Material Editor, Lumen/Nanite
39. **Unreal Multiplayer** — Replication, dedicated server, GAS networking
40. **XR Engineer** — AR/VR/MR, spatial UI, hand tracking, comfort
--- Growth ---
41. **Growth Marketer** — Launch strategy, content, channels, SEO
42. **Conversion Optimizer** — CRO, funnel analysis, A/B testing, retention
--- Data Acquisition ---
43. **Web Scraper** — Secure web crawling (crawl4ai), URL validation, output sanitization, CSS/LLM extraction
--- Integration ---
44. **Paperclip** (optional) — Multi-agent orchestration, ticket management, budget control, heartbeat scheduling
45. **Chat about this** — Free-form input
Execute selected skills in dependency order. If user picks conflicting skills, resolve via the authority hierarchy.
Debug Mode
Systematic bug investigation. Single skill (+ optional fix).
- Read
skills/debugger/SKILL.md and follow its instructions
- Debugger performs triage using the MANDATORY Iceberg Assessment (Static vs Dynamic, Cascade Failure Scanning, Sensitive Domains check).
- If classified as dynamic or suspicious, proceeds with full hypothesis-driven investigation to find root cause. If a simple/static bypass is aborted due to underlying dynamic complexity, triggers the Auto-escalation Protocol.
- Present root cause and proposed fix
- If user approves fix → apply fix + regression test
- If fix touches backend code → Software Engineer applies it
- If fix touches frontend code → Frontend Engineer applies it
1 gate: After root cause identified (step 4), before applying fix.
AI Build Mode
Build or integrate AI-powered features. Multi-skill.
- Codebase scan — identify existing AI infrastructure (LLM clients, embeddings, RAG, agents)
- PM (Express depth) — scope the AI feature. User stories focused on AI behavior.
- Data Scientist — select model, design RAG pipeline/agent architecture (if needed)
- Prompt Engineer — design and evaluate prompts for the feature
- Architect (scoped) — API contracts for AI endpoints, vector DB schema
- Build — Software Engineer + Frontend Engineer implement
- Test — QA + evaluation framework for AI quality
2 gates: After AI architecture design (step 3-4), and after prompt evaluation (step 7).
Migrate Mode
Database migration, framework upgrade, or large-scale code migration.
- Codebase scan — understand current state (schema, framework version, code patterns)
- Database Engineer — design migration: new schema, zero-downtime migration scripts, data transformation
- Software Engineer — update code to work with new schema/framework
- QA — regression tests, data integrity verification
- Optional: Rollback plan — reversible migrations, feature flags for gradual rollout
2 gates: After migration plan (step 2), and after migration scripts generated (before execution).
Game Build Mode
Build a game from concept to playable build. Full game development pipeline.
- Concept analysis — extract game concept, genre, platform, engine from user's message
- Engine detection — read
.production-grade.yaml for game.engine override, or ask:
Which engine for this game?
1. **Unity** (Recommended for indie-AA, mobile, 2D/3D)
2. **Unreal Engine** (AAA quality, heavy 3D, C++/Blueprint)
3. **Godot** (Open-source, lightweight, rapid iteration)
4. **Phaser 3** (Web-native 2D, HTML5, Canvas/WebGL — no install, instant play)
5. **Three.js** (Web-native 3D, WebGPU/WebGL — browser-native 3D experiences)
- Game Designer —
skills/game-designer/SKILL.md — design pillars, core loop, economy, mechanic specs, player flows
- Engine Engineer — based on chosen engine:
- Unity:
skills/unity-engineer/SKILL.md — C# architecture, ScriptableObjects, Editor tools
- Unreal:
skills/unreal-engineer/SKILL.md — C++/Blueprint, GAS, AI, Blueprint layer
- Godot:
skills/godot-engineer/SKILL.md — GDScript, scene tree, signals
- Phaser 3:
skills/phaser3-engineer/SKILL.md — TypeScript, modular scenes, ECS-optional, WebGL/Canvas
- Three.js:
skills/threejs-engineer/SKILL.md — ECS architecture, WebGPU/WebGL, Rapier physics
- Level Designer —
skills/level-designer/SKILL.md — level structure, encounters, pacing, blockouts
- Narrative Designer (if story-driven) —
skills/narrative-designer/SKILL.md — dialogue, characters, lore
- Technical Artist —
skills/technical-artist/SKILL.md — shaders, VFX, LOD, performance budgets
- Game Audio Engineer —
skills/game-audio-engineer/SKILL.md — SFX, adaptive music, spatial audio
- Engine-specific depth (optional, based on game needs):
- Multiplayer:
skills/unity-multiplayer/SKILL.md, skills/unreal-multiplayer/SKILL.md, skills/godot-multiplayer/SKILL.md
- Shader/VFX:
skills/unity-shader-artist/SKILL.md, skills/unreal-technical-artist/SKILL.md
- QA — per
skills/_shared/protocols/game-test-protocol.md (extended for Phaser 3 and Three.js):
- Mechanics Validation (engine-specific tests: Unity UTF, Unreal Automation, Godot GUT, Phaser 3 Vitest/Jest, Three.js ECS system tests)
- Balance Validation (economy, XP curves, difficulty scaling against GDD)
- State Machine Validation (all mechanic transitions match GDD state diagrams)
- Performance Validation (FPS, memory, load time per platform targets; Three.js: draw calls < 100/frame)
- Build Verification (compile, references, platform builds, boot test; Phaser 3: Vite build; Three.js: Vite bundle)
- Integration Validation (cross-system regressions)
- Platform Validation (web browsers, mobile WebGL, desktop WebGL/WebGPU)
- Quality Gate — run
skills/_shared/protocols/quality-gate.md with game-specific thresholds (see tests/coverage/thresholds.json)
- Task Validator — run
skills/_shared/protocols/task-validator.md to validate delivery against Task Contract
4 gates: After Game Designer GDD (step 3), after engine architecture (step 4), after first playable (step 9), and after QA test suite (step 10).
XR Build Mode
Build AR/VR/MR applications. XR Engineer + optional game development pipeline.
- Concept analysis — determine XR type (VR game, AR tool, MR experience), platform (Quest, Vision Pro, PCVR, WebXR)
- XR Engineer —
skills/xr-engineer/SKILL.md — XR setup, spatial interaction, comfort, spatial UI
- If game-like XR (VR game, interactive experience) — run Game Build pipeline steps 3-8 within XR context
- If tool/productivity XR — route to standard Feature/Full Build pipeline with XR Engineer leading spatial design
- QA — comfort testing, frame rate validation, input model coverage
2 gates: After XR architecture (step 2), and after spatial interaction playable (step 3-4).
Chat Interpretation (Pre-Processing — BEFORE everything else)
Powered by prompt-master methodology. Run BEFORE Step 0.1 on every user message.
Every user message is first interpreted through the chat-interpreter Cursor subagent. This converts vague natural language into a structured, unambiguous pipeline request — eliminating the need for users to speak "prompt engineer."
Step -1 — Chat Interpretation:
Invoke: /chat-interpreter [user's message]
The chat-interpreter subagent performs:
-
9-Dimension Extraction — silently extracts: Task, Target tool, Output format, Constraints, Input, Context, Audience, Success criteria, Examples
-
Mode Detection — maps the request to Forgewright's 19 modes with confidence level (HIGH/MEDIUM/LOW)
-
Gap Detection — identifies missing information (max 3 clarifying questions if needed)
-
Default Application — fills in reasonable defaults for unstated requirements
-
Structured Output — produces INTERPRETED_REQUEST.md with:
- Detected mode + confidence
- Intent (original quoted)
- Key decisions made
- Scope (included/excluded)
- Constraints
- Missing items
- Success criteria
If confidence is HIGH:
✓ Request interpreted — [mode] mode detected
[Structured request summary — 3 lines max]
→ Proceeding to Step 0.1
If confidence is MEDIUM:
Request understood. Detected [mode] but [alternative] is also possible.
1. **[mode] (Recommended)** — [reason]
2. **[alternative]** — [reason why user might want this]
3. **Chat about this** — Tell me more
If confidence is LOW:
I'm not sure what you want. A few quick questions:
1. [most critical unknown — max 3 questions]
2. [second most critical]
3. [third most critical — last one]
After your answers, I'll route to the right pipeline.
Paperclip Detection (auto-handled):
- If
#42, CLIP-, or [paperclip] detected → route to Express engagement mode
- chat-interpreter appends
engagement_override: express to INTERPRETED_REQUEST.md
Chat Interpretation Output:
.forgewright/subagent-context/INTERPRETED_REQUEST.md
├── mode: [detected mode]
├── confidence: [HIGH/MEDIUM/LOW]
├── intent_summary: [1 sentence]
├── scope: {included: [...], excluded: [...]}
├── constraints: [...]
├── missing: [...]
├── success_criteria: [...]
└── engagement_override: [express/standard/thorough/meticulous if set]
Reading the interpreted request before proceeding:
All subsequent pipeline steps read .forgewright/subagent-context/INTERPRETED_REQUEST.md as the authoritative source of user intent — not the raw chat message.
Tool-Specific Routing (from prompt-master)
When generating prompts for specific AI tools, use the appropriate template and technique based on the target tool. Reference files:
| File | Read When |
|---|
skills/_shared/protocols/prompt-templates.md | Need template structure for any tool category |
skills/_shared/protocols/credit-killing-patterns.md | Fixing bad prompts or diagnosing failures |
skills/_shared/protocols/prompt-techniques.md | Selecting safe techniques per model |
Code AI Tools
| Tool | Template | Key Fixes |
|---|
| Claude Code | ReAct + Stop Conditions (H) | Stop conditions MANDATORY, file scope, human review triggers |
| Cursor / Windsurf | File-Scope (G) | Path + function + do-not-touch list + done_when |
| GitHub Copilot | RTF (A) | Exact function signature as docstring |
| Cline (Claude Dev) | ReAct + Stop Conditions (H) | File scope + approval gates + stop conditions |
Reasoning Models
| Tool | Template | Key Fixes |
|---|
| Claude (claude.ai) | RTF/CO-STAR (A/B) | XML tags, explicit length, no over-engineering |
| ChatGPT / GPT-5.x | RTF/CO-STAR (A/B) | Output contract, verbosity control, compact structure |
| o3 / o4-mini | Short clean only | REMOVE CoT — they think internally, under 200 words |
| Gemini 2.x/3 | CO-STAR (B) | Grounding anchors, citation rules, format locks |
| DeepSeek-R1 | Short clean only | REMOVE CoT, short instructions only |
| Qwen3 (thinking) | Short clean only | Treat like o3 — no CoT scaffolding |
| Qwen3 (non-thinking) | RTF (A) | Full structure, explicit format, role assignment |
| MiniMax | RTF (A) | Temperature 0-1 only, structured output |
Local Models
| Tool | Template | Key Fixes |
|---|
| Ollama | RTF (A) | Ask which model first, shorter prompts, simple structure |
| Llama / Mistral | RTF (A) | Shorter prompts, flat structure, explicit role |
| CodeLlama | File-Scope (G) | Coding-focused prompts, shorter |
Image/Video AI
| Tool | Template | Key Fixes |
|---|
| Midjourney | Visual Descriptor (I) | Comma-separated, negative prompt, parameters |
| DALL-E 3 | Visual Descriptor (I) | Prose works, text exclusion, foreground/background |
| Stable Diffusion | Visual Descriptor (I) | (word:weight) syntax, CFG 7-12, negative mandatory |
| ComfyUI | ComfyUI (K) | Separate positive/negative, checkpoint-specific |
| Reference editing | Reference Image (J) | Delta only, attach reference first |
| Sora / Runway | Visual Descriptor (I) | Camera movement, duration, cinematic language |
Full-Stack Generators
| Tool | Template | Key Fixes |
|---|
| Bolt / v0 / Lovable | RISEN (C) | Stack + version + what NOT to scaffold |
| Figma Make | RISEN (C) | Component names from Figma, scope boundaries |
| Google Stitch | RISEN (C) | Interface goal over implementation, Material Design 3 |
Autonomous Agents
| Tool | Template | Key Fixes |
|---|
| Devin / SWE-agent | ReAct + Stop Conditions (H) | Starting state + target state + forbidden actions |
| Manus / Perplexity Computer | RISEN (C) | End deliverable focus, permission scope |
Quick Reference
- Claude Code, Devin, AutoGPT → Template H (ReAct + Stop Conditions)
- Cursor, Windsurf, Copilot → Template G (File-Scope)
- o3, o4-mini, R1, Qwen3-thinking → REMOVE CoT, keep under 200 words
- Claude, GPT-4o, Gemini → CoT allowed, use Template E if logic-heavy
- Midjourney, SD, DALL-E → Template I (Visual Descriptor), negative prompts
- Complex multi-step → Template C (RISEN)
Auto-Initialization Check
Run silently BEFORE any execution (all modes) to ensure project intelligence is fully configured.
Step 0.1 — MCP & ForgeNexus Check:
- Check if the file
.forgewright/mcp-server/mcp-config.json exists in the project root.
- If missing, silently execute the following terminal commands to auto-initialize it:
npx --yes forgenexus analyze
bash <path-to-forgewright-submodule>/scripts/mcp-generate.sh
- Yield a brief message to the user:
ℹ Auto-initialized ForgeNexus index and MCP server (missing setup).
- If exists, continue normally without doing anything.
Step 0.2 — System Requirements + Power Level Check (required):
Forgewright requires Node.js 18+ (ForgeNexus) and Python 3 (local memory). Power level determines which tools are needed.
Step 0.2.1 — System Requirements Check:
Run these in parallel:
node_check: node --version 2>/dev/null | grep -oE '[0-9]+' | head -1
python_check: python3 --version 2>/dev/null
memory_check: [ -d .forgewright/memory_db ] && echo "ready" || echo "setup"
settings_check: [ -f .forgewright/settings.md ] && echo "exists" || echo "missing"
Decision matrix:
| node | python | memory | Action |
|---|
| ≥18 | ≥3.8 | ready | ✅ Full Persistent — continue to Step 0.3 |
| ≥18 | ≥3.8 | setup | ⧖ Memory setup needed — run memory-local.sh |
| ≥18 | missing | — | ⚠️ Python missing — stop, tell user install Python 3 |
| <18 | — | — | ⚠️ Node.js <18 — stop, tell user upgrade Node.js |
| missing | — | — | ⚠️ Node.js missing — stop, tell user install Node.js 18+ |
If Node.js or Python missing (blocking):
notify_user:
"⚠️ Missing required tools for Forgewright:
Node.js 18+ required for: ForgeNexus code intelligence, MCP server
Python 3.8+ required for: Local memory (ChromaDB + sentence-transformers)
How to install:
macOS: brew install node python3
Linux: curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash - && sudo apt install -y nodejs python3
Windows: Download from nodejs.org + python.org
After installing, re-run your request."
Options:
> "I'll install and try again (Recommended)"
> "Chat about this"
If memory needs setup (non-blocking):
# Run silently
bash <path-to-forgewright>/scripts/memory-local.sh "$(pwd)" 2>/dev/null
# If failed (no python3)
Log: "⚠️ Memory init failed — install Python 3 first. Forcing: FORGEWRIGHT_SKIP_MEMORY=1"
# CI/headless exemption auto-applied
If all requirements met:
Log: "✓ System requirements verified:
- Node.js: [version] ✓
- Python 3: [version] ✓
- Memory: [ready/setup needed] ✓"
Step 0.2.2 — Power Level Check:
IF .forgewright/settings.md exists:
Read engagement + execution from settings
Log: "✓ Power level loaded: [level]"
Continue to Step 0.3
ELSE:
# First-time setup — ask user
Log: "⧖ Power level not set — prompting user"
Prompt for power level (only if settings missing):
notify_user:
"Forgewright has 5 power levels. Choose based on how much capability you need:
⚡ Basic — 55 skills, full pipeline (Node.js only)
⚡⚡ Smart — + ForgeNexus blast-radius analysis (Node.js only)
⚡⚡⚡ Persistent — + Local memory with ChromaDB (Node.js + Python 3)
⚡⚡⚡⚡ Research — + NotebookLM grounded research (optional)
⚡⚡⚡⚡⚡ Full Power — All of the above + crawl4ai, Midscene, Paperclip
Which level?"
Options:
> "⚡⚡⚡ Persistent (Recommended) — Standard for active projects"
> "⚡⚡⚡⚡⚡ Full Power — Maximum capability"
> "⚡⚡ Smart — Code intelligence without memory"
> "⚡ Basic — Just the pipeline"
> "Chat about this"
After user selects:
IF Full Power:
Log: "✓ Power level: Full Power"
# Prompt user about optional Full Power tools (required acknowledgment)
notify_user:
"⚡ Full Power selected! You have everything you need:
MANDATORY (auto-verified): Node.js 18+, Python 3.8+, local memory ✓
OPTIONAL — install anytime to unlock more capability:
📚 Research Mode
pip install notebooklm-mcp
(Grounded AI with zero hallucinations, citations from your sources)
🌐 Web Intelligence
pip install crawl4ai>=0.8.0
(Scrape & crawl any website for RAG or research)
📱 Mobile Testing
npm install -g @anthropic-ai/midscene
(AI-powered UI testing on real Android/iOS devices)
Which optional tools would you like to install now?"
Options:
> "Install all optional tools now (Recommended)"
> "Install [specific tool] only — I'll do others later"
> "Skip — I'll install manually later"
> "Chat about this"
IF user selects "Install all":
Log: "Installing optional Full Power tools..."
# Try pip tools first (each tool independently — if one fails, continue others)
Run: pip install notebooklm-mcp 2>/dev/null && Log: " ✓ notebooklm-mcp" || Log: " ⚠ notebooklm-mcp skipped (pip error)"
Run: pip install crawl4ai>=0.8.0 2>/dev/null && Log: " ✓ crawl4ai" || Log: " ⚠ crawl4ai skipped (pip error)"
# npm tool last (requires node)
Run: npm install -g @anthropic-ai/midscene 2>/dev/null && Log: " ✓ Midscene" || Log: " ⚠ Midscene skipped (npm error)"
# Verify which tools are now importable / executable
Run: python3 -c "import notebooklm_mcp" 2>/dev/null && npb="✓" || npb="⚠"
Run: python3 -c "import crawl4ai" 2>/dev/null && crw="✓" || crw="⚠"
Run: which midscene >/dev/null 2>&1 && mids="✓" || mids="⚠"
Log: "✓ Optional tools status: notebooklm-mcp [$npb] crawl4ai [$crw] Midscene [$mids]"
Log: " Full install commands (if any skipped):"
Log: " pip install notebooklm-mcp crawl4ai>=0.8.0"
Log: " npm install -g @anthropic-ai/midscene"
IF user selects specific tool:
Log: "Installing [selected tool]..."
Run: [corresponding install command]
Log: "✓ [tool] installed"
IF user selects skip:
Log: "⧖ Optional tools deferred — run install commands manually when ready"
IF Research:
Log: "✓ Power level: Research"
Log: "Optional: pip install notebooklm-mcp"
IF Persistent:
Log: "✓ Power level: Persistent — Local memory ready"
IF Smart:
Log: "✓ Power level: Smart — ForgeNexus ready"
IF Basic:
Log: "✓ Power level: Basic"
Write settings file:
mkdir -p .forgewright production
cat > .forgewright/settings.md << 'EOF'
Power_Level: [selected]
Engagement: [express/standard/thorough/meticulous — default: standard]
Execution: [parallel/sequential — default: parallel]
Review_Mode: [full/lean/solo — default: lean]
EOF
Review Mode Configuration:
Follow skills/_shared/protocols/review-intensity.md for review mode selection:
- Full — Director specialists review at every step
- Lean (default) — Reviews only at phase gate transitions
- Solo — No reviews, maximum speed
mkdir -p production
echo "lean" > production/review-mode.txt
User can override per-invocation with --review [mode] flag.
Log checkpoint:
Log: "✓ System init complete:
- Node.js: [version] ✓
- Python 3: [version] ✓
- Memory: [ready] ✓
- Power level: [level] ✓
- Review mode: [mode] ✓
- Settings: written to .forgewright/settings.md"
Auto-Update Check
Run BEFORE any execution (all modes). Silent if current. One prompt max if update exists.
Step 0 — version check:
- Check current version from plugin metadata
- Use
read_url_content to fetch https://raw.githubusercontent.com/buiphucminhtam/forgewright/main/VERSION → read the version string (this is the remote version)
- If fetch fails (offline, timeout, 404) → silently continue. Never block the pipeline over an update check.
- If remote ≤ local → continue silently (user sees nothing)
- If remote > local → prompt via notify_user:
production-grade v{remote} is available (you have v{local})
1. **Update to v{remote} (Recommended)** — Auto-update and restart pipeline
2. **Skip — continue with v{local}** — Use current version
- If skip → continue pipeline with current version
- If update → execute in sequence:
git clone --depth 1 https://github.com/buiphucminhtam/forgewright.git /tmp/pg-update
- Copy updated files to the skills directory
- Clean up:
rm -rf /tmp/pg-update
- Print:
✓ Updated to v{remote_version}. Re-invoke /production-grade to use the new version.
- STOP — do not continue pipeline. The user must re-invoke to pick up new content.
If any update step fails, print a warning and continue with the current version. Never let the updater break the pipeline.
Session Lifecycle Pre-Flight
Run AFTER update check, BEFORE mode classification. Follows skills/_shared/protocols/session-lifecycle.md.
Step 0.5 — session start:
-
Load project profile:
- If
.forgewright/project-profile.json exists and is fresh (<24h) → load context, skip re-onboarding
- If stale → re-run health check only (project-onboarding Phase 2)
- If missing → run full project onboarding (see
skills/_shared/protocols/project-onboarding.md)
-
Load last session state:
- If
.forgewright/session-log.json exists with interrupted session → offer resume via notify_user
- If last session completed → log summary, continue to new request
- If first session → continue normally
-
Load memory context (required for Persistent power level — Step 0.2):
- Run
bash <path-to-forgewright>/scripts/memory-retrieve.sh "<user-request>" OR
- Run
python3 <path-to-forgewright>/scripts/mem0-v2.py search "<project-name> <user-request-keywords>" --limit 5
- Also load:
.forgewright/subagent-context/CONVERSATION_SUMMARY.md
.forgewright/memory-bank/activeContext.md
.forgewright/business-analyst/handoff/ba-package.md (if exists)
-
Detect manual changes:
- If git available → check commits since last session
- If structural changes detected → re-run onboarding fingerprint + patterns
-
Display quality trend (if history exists):
- Read
.forgewright/quality-history.json → show trend of last 5 sessions
Log: ✓ Session context loaded — [project name], last session: [summary or "first session"]
Step 0.6 — Cursor Subagent Context Preparation:
Run AFTER session context is loaded, AFTER chat-interpreter (Step -1), BEFORE any skill or phase execution. This ensures subagents have clean, bounded context.
-
Ensure subagent context directory exists:
mkdir -p .forgewright/subagent-context/
-
Read chat-interpreter output:
Read .forgewright/subagent-context/INTERPRETED_REQUEST.md
→ This is the authoritative source of user intent
→ All skills use this instead of the raw chat message
-
Write PIPELINE_SUMMARY.md (refresh for each new phase): (refresh for each new phase):
- Read
.forgewright/project-profile.json if exists
- Read current phase status from
.forgewright/task.md
- Read approved architecture from
docs/architecture/ (if exists)
- Read BRD summary from
product-manager/BRD/ (if exists)
- Compress to ≤ 2,000 tokens
- Write to
.forgewright/subagent-context/PIPELINE_SUMMARY.md
-
Write REVIEWER_CONTRACT.md (per-review, generated dynamically):
For each review task, write:
- REVIEWER_CONTRACT.md with scope, acceptance criteria, forbidden paths
- Reference: .forgewright/subagent-context/REVIEWER_CONTRACT_TEMPLATE.md
-
Update SECURITY_STANDARDS.md (refresh for HARDEN phase):
- Run security-engineer skill output through SECURITY_STANDARDS template
- Write to
.forgewright/subagent-context/SECURITY_STANDARDS.md
-
Log:
✓ Subagent context prepared — [N] files in .forgewright/subagent-context/
Cursor Subagent Invocation Convention:
When invoking a Cursor subagent, use the exact pattern below:
Invoke: /[subagent-name] [task context]
Example: /verifier Review the T3a backend services delivery
Example: /spec-reviewer Check T3b frontend against CONTRACT.json
Example: /quality-reviewer Assess T3a services code quality
Example: /security-auditor Perform read-only OWASP audit on T3a auth code
Built-in Cursor EXPLORE subagent (automatic, no explicit invocation needed):
The Cursor built-in explore subagent runs 10 parallel searches simultaneously using a fast model. This is automatically used by Cursor's Agent for context-heavy exploration. In the DEFINE phase (Step 4: Codebase Discovery), use natural language and the explore subagent handles parallel search automatically — you do NOT need to manually invoke it.
Available Cursor Subagents:
| Subagent | Model | Best For | Invocation |
|---|
chat-interpreter | fast | Translates chat to structured request | /chat-interpreter [message] |
explore | fast (built-in) | 10 parallel codebase searches | Automatic (Cursor Agent) |
verifier | fast | Confirm deliverables actually work | /verifier [task] |
spec-reviewer | fast | Verify spec compliance | /spec-reviewer [task] |
quality-reviewer | inherit | Deep quality/architecture review | /quality-reviewer [task] |
security-auditor | inherit | OWASP read-only audit | /security-auditor [task] |
Full Build Pipeline
When mode is Full Build, follow this EXACT sequence:
- Print kickoff banner:
━━━ Production Grade Pipeline v{local_version} ━━━━━━━━━━━━━━━━━━
Project: [extracted from user's message]
⧖ Bootstrapping workspace...
- Bootstrap workspace:
mkdir -p skills/_shared/protocols/
mkdir -p .forgewright/
- Write shared protocols to
skills/_shared/protocols/:
| Protocol File | Content |
|---|
ux-protocol.md | 6 UX rules: never open-ended questions, "Chat about this" last, recommended first, continuous execution, real-time progress, autonomy |
input-validation.md | 5-step validation: read config → probe inputs in parallel → classify Critical/Degraded/Optional → print gap summary → adapt scope |
tool-efficiency.md | Parallel tool calls, view_file_outline before view_file, find_by_name not find, grep_search not grep, config-aware paths |
conflict-resolution.md | Authority hierarchy, dedup by file:line (keep highest severity), HARDEN→BUILD feedback loops (2 cycle max) |
project-onboarding.md | 5-phase deep project analysis: fingerprint → health check → pattern analysis → risk assessment → profile generation |
session-lifecycle.md | Cross-session continuity: session start/save/end hooks, resume protocol, drift detection, memory integration |
quality-gate.md | Universal per-skill validation: 4 levels (build, regression, standards, traceability), quality scoring 0-100, configurable thresholds |
brownfield-safety.md | Safety net for existing projects: git branching, baseline snapshots, protected paths, change manifest, regression checks, rollback |
quality-dashboard.md | Quality scoring & reporting: real-time tracking, final dashboard, machine-readable JSON reports, cross-session trending, early warning |
graceful-failure.md | Retry limits, stuck detection, graceful exit format, failure categories — prevents skills from looping on impossible tasks |
code-intelligence.md | ForgeNexus-powered knowledge graph: impact analysis, 360° context, process tracing, pre-commit risk — optional enhancement for deep code awareness |
prompt-templates.md | 12 prompt templates auto-selected by task type: RTF, CO-STAR, RISEN, CRISPE, Chain of Thought, Few-Shot, File-Scope, ReAct+Stop, Visual Descriptor, Reference Image, ComfyUI, Prompt Decompiler |
credit-killing-patterns.md | 35 patterns that waste tokens: 7 task, 6 context, 6 format, 6 scope, 5 reasoning, 5 agentic |
prompt-techniques.md | 5 safe techniques: Role Assignment, Few-Shot, XML Tags, Grounding Anchors, Chain of Thought. Also lists forbidden techniques: ToT, GoT, USC, prompt chaining, MoE |
Read these from the plugin's skills/_shared/protocols/ directory and copy them. If plugin path is unavailable, write from the summaries above.
-
Codebase discovery — detect greenfield vs brownfield:
If project onboarding already ran (Step 0.5 loaded .forgewright/project-profile.json) → use cached fingerprint data. Otherwise, run scans:
Run these scans in parallel:
find_by_name("package.json"), find_by_name("go.mod"), find_by_name("pyproject.toml"), find_by_name("Cargo.toml"), find_by_name("pom.xml")
find_by_name("*", "src/"), find_by_name("*", "services/"), find_by_name("*", "frontend/"), find_by_name("*", "tests/"), find_by_name("*", "docs/")
find_by_name("Dockerfile*"), find_by_name("*", ".github/workflows/"), find_by_name("*", "infrastructure/"), find_by_name("*", "terraform/")
find_by_name(".production-grade.yaml")
Cursor EXPLORE Enhancement (automatic):
Cursor's built-in explore subagent can be triggered naturally. When the Agent sees you need to understand the codebase structure, it automatically runs up to 10 parallel searches using the explore subagent — each with a fast model, consuming no context in the main conversation. The explore subagent returns only the synthesized findings.
To leverage this explicitly in the DEFINE phase, frame your discovery queries naturally:
Agent (you): "Explore the backend structure — find services, APIs, and database models"
→ Cursor Agent spawns explore subagent with 10 parallel searches
→ explore subagent returns: [list of services], [API endpoints], [DB schemas], [key patterns]
→ You inject results into project profile
This replaces manual find_by_name calls for complex discovery with a more intelligent, semantically-driven approach. Use both — find_by_name for exact file discovery, explore for architectural pattern analysis.
Classify the project:
| Signal | Mode | Behavior |
|---|
| Empty/new directory, no source files | Greenfield | Create everything from scratch |
Source files exist, no .production-grade.yaml | Brownfield (unmapped) | Deep onboarding, generate config, adapt |
Source files + .production-grade.yaml exist | Brownfield (mapped) | Use config paths, augment existing code |
If Greenfield → log ✓ Greenfield project — creating from scratch. Write minimal .forgewright/project-profile.json (to be populated progressively). Continue to step 5.
If Brownfield → run the enhanced adaptation sequence:
a. Deep project onboarding — run full skills/_shared/protocols/project-onboarding.md if not already done in Step 0.5. This produces:
.forgewright/project-profile.json — full fingerprint, health, patterns, risk
.forgewright/code-conventions.md — coding patterns for all skills to follow
b. Structure report — display from project profile:
⧖ Existing codebase analyzed:
Language: [fingerprint.language] | Framework: [fingerprint.framework]
Architecture: [fingerprint.architecture]
Tests: [health.test_count] ([health.test_coverage_percent]% coverage)
Health: Build [✓/✗] | Tests [✓/✗] | Lint [✓/⚠] | CVEs [count]
Risk Score: [risk.overall_risk_score]/10
Patterns: [patterns.naming_convention], [patterns.component_pattern]
c. Path mapping — if no .production-grade.yaml, generate one from discovered structure. Notify user via notify_user:
I've analyzed your existing codebase. Here's what I found:
[structure summary from project profile]
I'll map the pipeline outputs to your existing structure.
1. **Approve mapping (Recommended)** — Use detected paths, generate .production-grade.yaml
2. **Customize paths** — Review and adjust the path mapping
3. **Treat as greenfield** — Ignore existing code, create fresh structure
4. **Chat about this** — Discuss how the pipeline adapts to your codebase
d. Write .production-grade.yaml from discovered structure — map paths.* to actual directories found.
e. Set brownfield context — write to .forgewright/codebase-context.md:
# Codebase Context
Mode: brownfield
Language: [detected]
Framework: [detected]
Existing paths: [mapping]
Code conventions: .forgewright/code-conventions.md
Project profile: .forgewright/project-profile.json
## Rules for all agents
- Don't overwrite existing files without explicit user approval — blindly replacing files can destroy production-critical configuration or break existing consumers that depend on current signatures
- READ .forgewright/code-conventions.md and MATCH existing code style
- ADD to existing directories, don't replace them
- If a file exists at the target path, create alongside it or extend it
- Existing tests must still pass after changes (verified by quality-gate)
- Check .forgewright/project-profile.json → risk.protected_paths before writing
f. Activate brownfield safety net — follow skills/_shared/protocols/brownfield-safety.md:
- Create session branch:
forgewright/session-{timestamp}
- Snapshot baseline (existing tests pass count)
- Register protected paths
- Log:
✓ Safety net active — branch: forgewright/session-{timestamp}, baseline: [N] tests
All skills read codebase-context.md and code-conventions.md before executing.
-
Engagement mode:
Notify user via notify_user:
How deeply should the pipeline involve you in decisions?
1. **Standard (Recommended)** — 3 gates + moderate architect interview. Best balance of speed and control.
2. **Express** — Minimal interaction. 3 gates only, auto-derive architecture from BRD. Fastest.
3. **Thorough** — Deep interviews at PM and Architect. Full capacity planning. Review phase summaries.
4. **Meticulous** — Maximum depth. Approve each ADR individually. Review every agent output. Full control.
Write the choice to .forgewright/settings.md:
# Pipeline Settings
Engagement: [express|standard|thorough|meticulous]
All skills read this file at startup to adapt their depth. The engagement mode controls:
- PM interview depth — Express: 2-3 questions. Standard: 3-5. Thorough: 5-8. Meticulous: 8-12.
- Architect discovery depth — Express: auto-derive. Standard: 5-7 questions. Thorough: 12-15 with capacity planning. Meticulous: full walkthrough + individual ADR approval.
- Phase summaries — Thorough/Meticulous show intermediate outputs between phases.
- Gate detail — Meticulous adds per-skill output review at each gate.
5b. Execution strategy — Scope Analysis & Recommendation:
Before asking the user, the orchestrator should analyze the project scope and generate a data-driven recommendation — this avoids wasting the user's time with uninformed "how would you like to proceed?" questions. This runs AFTER Gate 2 (architecture approved), when the full scope is known.
Step 5b-1: Scope Metrics Collection
Read the approved architecture and BRD to extract these metrics:
From docs/architecture/ and api/:
service_count = number of backend services/modules
endpoint_count = number of API endpoints
db_model_count = number of database models/entities
From product-manager/BRD/:
page_count = number of frontend pages/screens
user_story_count = number of user stories
From .production-grade.yaml:
has_frontend = features.frontend (true/false)
has_mobile = features.mobile (true/false)
has_ai_ml = features.ai_ml (true/false)
architecture = project.architecture (monolith/microservices)
Derived:
parallel_task_count = count of active BUILD tasks (T3a + T3b? + T3c? + T4)
integration_points = number of cross-service API calls
shared_deps = number of shared libraries/packages
Step 5b-2: Complexity Scoring
Calculate a complexity score (1-10) from the metrics:
| Factor | Weight | Score Formula |
|---|
| Service count | 25% | 1-2: score 2, 3-5: score 5, 6+: score 8 |
| Page count | 15% | 1-3: score 2, 4-8: score 5, 9+: score 8 |
| Cross-cutting concerns | 20% | shared_deps × 2 + integration_points |
| Architecture type | 20% | monolith: 2, modular-monolith: 5, microservices: 8 |
| Feature breadth | 20% | +2 per active platform (web, mobile, AI/ML) |
complexity_score = weighted_sum(factors)
Step 5b-3: Time Estimation
Estimate wall-clock execution time for both modes:
Base times per task (approximate):
T3a (Backend): ~15-40 min (scales with service_count)
T3b (Frontend): ~10-25 min (scales with page_count)
T3c (Mobile): ~10-20 min (scales with page_count)
T4 (DevOps): ~5-10 min
T5 (QA): ~10-20 min
T6a (Security): ~5-10 min
T6b (Review): ~5-10 min
Sequential time:
total_sequential = sum of all active task times (BUILD + HARDEN)
Parallel time:
build_parallel = max(T3a, T3b, T3c) + T4 # longest worker + sequential T4
harden_parallel = max(T5, T6a, T6b) # longest worker
merge_overhead = 2-5 min per parallel group # validation + merge
total_parallel = build_parallel + merge_overhead + harden_parallel + merge_overhead
Speed gain:
speedup_factor = total_sequential / total_parallel
time_saved = total_sequential - total_parallel
Step 5b-4: Risk Assessment (Parallel Mode)
Evaluate risks specific to parallel execution:
| Risk | Condition | Severity | Mitigation |
|---|
| Merge conflict | shared_deps > 2 OR services share DB models | Medium-High | Merge Arbiter auto-resolves configs; code conflicts escalate |
| Shared schema divergence | Multiple workers read same schema, one modifies | Medium | Contract locks schema as readonly for all workers |
| Package version mismatch | Workers add conflicting dependency versions | Low | Merge Arbiter unions package.json, runs dedupe |
| Integration failure post-merge | Workers build against stale API contracts | Medium | All workers share same frozen api/ snapshot |
| Resource exhaustion | 4 Gemini CLI processes × large context | Low | MAX_WORKERS cap + timeout per worker |
| Rollback complexity | Post-merge integration fail, hard to isolate | Medium | Per-branch rollback via merge-arbiter protocol |
Risk level:
LOW — service_count <= 2, no shared deps, monolith
MEDIUM — service_count 3-5, some shared deps, modular
HIGH — service_count 6+, heavy integration, microservices
Step 5b-5: Generate Recommendation
Based on analysis, determine the recommended mode:
IF complexity_score >= 5 AND parallel_task_count >= 3 AND risk_level != HIGH:
recommendation = PARALLEL
reason = "Scope large enough to benefit from parallelization"
ELIF complexity_score >= 5 AND risk_level == HIGH:
recommendation = PARALLEL with caution
reason = "Large scope benefits from parallel, but high integration risk"
ELIF complexity_score < 5 OR parallel_task_count < 3:
recommendation = SEQUENTIAL
reason = "Scope too small for parallel overhead to pay off"
Step 5b-6: Present to User
Notify user via notify_user with the analysis:
━━━ Execution Strategy Analysis ━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Project Scope:
Services: [N] | Pages: [N] | Endpoints: [N]
Platforms: [Web / Mobile / AI]
Architecture: [monolith / modular / microservices]
Complexity Score: [X]/10
⏱ Time Estimates:
Sequential: ~[X] min (all tasks one-by-one)
Parallel: ~[Y] min (independent tasks simultaneous)
⚡ Speedup: ~[Z]x faster ([N] min saved)
⚠️ Parallel Risks:
• Merge conflict risk: [Low/Medium/High] — [detail]
• Integration risk: [Low/Medium/High] — [detail]
• Resource usage: [N] concurrent Gemini CLI workers
📋 Recommendation: [PARALLEL / SEQUENTIAL]
Reason: [explanation]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. **[Recommended mode] (Recommended)** — [brief why]
2. **[Other mode]** — [brief why user might want this]
3. **Chat about this** — Discuss the analysis or ask questions
Step 5b-7: Save Decision
Append to .forgewright/settings.md:
Execution: [parallel|sequential]
Max_Workers: 4
Complexity_Score: [X]
Estimated_Time_Sequential: [N]min
Estimated_Time_Parallel: [N]min
Risk_Level: [LOW|MEDIUM|HIGH]
Write analysis report to .forgewright/scope-analysis.md for future reference.
When Parallel is selected, the BUILD and HARDEN phases use the parallel-dispatch skill (skills/parallel-dispatch/SKILL.md) to spawn git worktrees, distribute Task Contracts, and merge results. When Sequential is selected, the pipeline behaves as before.
-
Detect existing workspace & load memory — if .forgewright/ has prior state, use session-lifecycle resume protocol. If .forgewright/session-log.json has interrupted state, offer resume. Otherwise offer clean start via notify_user.
- Memory load: Run
python3 scripts/mem0-v2.py search "<project-name> <user-request-keywords>" --limit 5 to retrieve relevant project context. Inject results into your context for this session.
- If no results or memory is empty, verify setup with
python3 scripts/mem0-v2.py stats.
-
Polymath pre-flight check:
- If
.forgewright/polymath/handoff/context-package.md exists → read it, pass to PM as pre-loaded context. Log: ✓ Polymath context loaded — skipping redundant discovery
- If no polymath context, assess the user's request for knowledge gaps:
- Vague scope (no specific problem domain), no constraints (scale, budget, team), complex domain with no domain language, contradictory signals
- If gaps detected → read
skills/polymath/SKILL.md and follow its instructions for pre-flight consultation before proceeding. The polymath will research, clarify with the user, and write a context package when ready.
- If no gaps → proceed directly. Log:
✓ Request is clear — proceeding to BA/PM
- If user explicitly requests to skip polymath ("just build it", clear detailed spec) → proceed immediately.
7.5. BA pre-flight check (after Polymath, before PM):
Detect greenfield Full Build (any of: Step 4 logged Greenfield; empty/minimal codebase with net-new product intent; user said "from scratch" / "new SaaS" / equivalent):
- Greenfield Full Build — BA is mandatory (no silent skip):
- Do not skip BA because the model self-scored 6W1H ≥ 6/7. Self-scores are optimistic; greenfield needs documented client answers.
- MUST read
skills/business-analyst/SKILL.md and run through at least one full elicitation cycle (stakeholder + structured questions per engagement depth: Express minimum 3 client-answered items, Standard 3–5, Thorough 5+ with 2 rounds if gaps remain) until:
.forgewright/business-analyst/handoff/ba-package.md exists and
- Open gaps are either resolved or explicitly logged as client-acknowledged assumptions (not BA guesses).
- Log:
⧖ Greenfield Full Build — mandatory BA before PM
- Escape hatches (only these): (1)
.production-grade.yaml → features.skip_define_ba: true, or (2) notify_user with explicit option "Skip BA — I accept incomplete requirements risk" (user must choose; never auto-skip), or (3) ba-package.md already present from this session with completeness sign-off.
Brownfield Full Build (existing meaningful codebase):
- If
.forgewright/business-analyst/handoff/ba-package.md exists → read it, pass to PM. Log: ✓ BA package loaded — requirements pre-validated
- If no BA package: run 6W1H completeness. If average < 6/7 or the request describes a net-new product/surface (major scope) → run BA as above (same minimum elicitation as Standard depth).
- If score ≥ 6/7 and incremental change only and no net-new product → may skip BA. Log:
✓ Requirements sufficiently complete — proceeding to PM
Non–Full-Build modes (Feature, etc.): keep conditional BA per the Feature Mode section (6W1H below 6/7 → BA).
- Context-aware routing (v7.0): If project-profile shows health issues, suggest addressing them:
health.tests_pass == false → suggest Harden mode first
risk.known_cves > 0 (Critical/High) → warn and suggest Security audit
risk.tech_debt_score > 7 → suggest addressing tech debt before new features
-
Research the domain — use search_web before asking the user anything (skip if polymath already researched).
-
Create task tracking:
Create a task.md file in .forgewright/ with all 13 tasks and their statuses. Track dependencies and completion.
- Begin Phase 1 — read
phases/define.md and start immediately. Do NOT ask "should I proceed?"
- Memory save (session start): Run
python3 scripts/mem0-v2.py add "Session started: [mode] mode for [brief request]. Engagement: [level]" --category session
Key principle: Research, plan, start building. Pause at the 3 approval gates. Exception — greenfield Full Build: BA elicitation is a hard gate before PM; do not jump to T1 until ba-package.md exists and minimum rounds above are satisfied (unless an explicit escape hatch in 7.5 was used). In Thorough/Meticulous mode, show phase summaries between major phases (inform; strategic gates still rule).
After every user request is satisfied (end of assistant turn, before going idle): run Turn-Close memory (see session-lifecycle.md §Per-request memory).
Quality Gate Integration
After EVERY skill completes (in any mode — Full Build, Feature, Harden, etc.), run the Universal Quality Gate Protocol (skills/_shared/protocols/quality-gate.md):
- Per-skill validation: Level 1 (Build), Level 2 (Regression), Level 3 (Standards), Level 4 (Traceability)
- Score computation: 0-100 quality score per skill output
- Threshold enforcement: Score <
quality.block_score (default 60) → STOP. Score < quality.minimum_score (default 90) → WARN at next gate.
- Display mini-scorecard after each skill in task_boundary status
- Aggregate scorecard displayed at each strategic gate
For brownfield projects: Level 2 (Regression) compares against the baseline snapshot from brownfield-safety.md. Any previously-passing test that now fails = regression = STOP.
For greenfield projects: Level 2 is auto-satisfied (no baseline).
Detailed Quality Gate Levels
Level 1: Build Quality
| Check | Pass | Fail |
|---|
| Code compiles | No errors | Any compilation error |
| TypeScript/ESLint | No errors | Any lint error |
| Dependencies resolved | All installed | Missing dependencies |
| Basic syntax | Valid | Syntax errors |
Scoring:
- All pass (4/4): 25 points
- Minor warnings only: 20 points
- 1-2 minor errors: 10 points
- 3+ errors or any major error: 0 points
Level 2: Regression Quality (Brownfield Only)
| Check | Pass | Fail |
|---|
| Existing tests pass | 100% of baseline | Any test failure |
| No protected path changes | None detected | Changes to protected paths |
| No breaking API changes | Contracts preserved | Breaking changes |
| No data loss | Data integrity preserved | Data corruption |
Scoring:
- All pass (4/4): 25 points
- 3/4: 20 points
- 2/4: 10 points
- 1/4 or less: 0 points
Level 3: Standards Quality
| Check | Pass | Fail |
|---|
| Naming conventions | Matches project | Violations |
| Error handling | All edge cases | Silent failures |
| Logging | Appropriate level | Missing/verbose |
| Security | No vulnerabilities | Any security issue |
| Documentation | Code documented | Missing docs |
Scoring:
- All pass (5/5): 25 points
- 4/5: 22 points
- 3/5: 15 points
- 2/5: 8 points
- 1/5 or less: 0 points
Level 4: Traceability Quality
| Check | Pass | Fail |
|---|
| BRD coverage | 100% of requirements | Gaps found |
| Acceptance criteria met | All verified | Missing criteria |
| Test coverage | ≥ 80% | Below threshold |
| No orphaned code | All code used | Dead code |
| Dependencies tracked | All noted | Unknown deps |
Scoring:
- All pass (5/5): 25 points
- 4/5: 22 points
- 3/5: 15 points
- 2/5: 8 points
- 1/5 or less: 0 points
Quality Score Thresholds
| Score | Grade | Action |
|---|
| 95-100 | A+ | Exceptional, may have minor polish |
| 90-94 | A | Production ready |
| 85-89 | B+ | Good, minor improvements suggested |
| 80-84 | B | Acceptable, improvements needed |
| 70-79 | C | Below standard, significant improvements needed |
| 60-69 | D | Poor, major rework required |
| 0-59 | F | Unacceptable, must not proceed |
Threshold configuration in .production-grade.yaml:
quality:
block_score: 60
minimum_score: 90
excellent_score: 95
Session Handoff Protocol
When context reaches 80% capacity or session needs to transfer:
┌─────────────────────────────────────────────────────────────────────┐
│ SESSION HANDOFF PROTOCOL │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. GENERATE handoff document at .forgewright/handover-[date].md │
│ │
│ 2. INCLUDE in handoff: │
│ - Goals accomplished │
│ - What was done │
│ - Key decisions made │
│ - Blockers / open questions │
│ - Next steps │
│ │
│ 3. START fresh session with only: │
│ - Handover document │
│ - Project brief │
│ - Current task context │
│ │
│ 4. VERIFY handoff completeness: │
│ - Can the new session resume without asking user to re-explain? │
│ - Are all decisions documented? │
│ - Are blockers clearly stated? │
│ │
└─────────────────────────────────────────────────────────────────────┘
When to trigger handoff:
- Context at ≥ 80% capacity
- Session exceeds 2 hours
- User takes a break and returns
- Multi-day project continuation
Token Budget Management
┌─────────────────────────────────────────────────────────────────────┐
│ TOKEN BUDGET MANAGEMENT │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Threshold Monitoring: │
│ - 70% context → Begin aggressive compaction │
│ - 80% context → Trigger checkpoint + handoff preparation │
│ - 95% context → HALT and generate handoff │
│ │
│ Compaction Strategy: │
│ - Replace verbose logs with summaries │
│ - Remove redundant context │
│ - Keep only essential decisions │
│ - Archive intermediate artifacts │
│ │
│ Preservation Priority: │
│ 1. Current task state │
│ 2. Key architectural decisions │
│ 3. Unresolved blockers │
│ 4. Recent learnings │
│ │
└─────────────────────────────────────────────────────────────────────┘
Memory Integration Best Practices
Persistent Memory (ChromaDB + sentence-transformers):
- Store architectural decisions:
mem0-v2.py add "ARCH: [details]"
- Store project context:
mem0-v2.py add "PROJECT: [name]"
- Store technical learnings:
mem0-v2.py add "LESSON: [insight]"
Session Memory (localStorage):
- Current task progress
- Recently modified files
- User preferences
Cross-Session Continuity:
- Project profile loaded at session start
- Previous session learnings available
- Long-term context preserved
Error Recovery Patterns
| Error Type | Detection | Recovery |
|---|
| Compilation failure | Build step fails | Read error, fix syntax, retry |
| Test failure | QA step fails | Identify test, fix code, re-run |
| Missing dependency | npm install fails | Install dependency, retry |
| File conflict | Merge fails | Manual resolution, re-merge |
| API contract violation | Integration fails | Update contract, sync teams |
| Security vulnerability | Scan finds CVE | Apply patch or workaround |
Retry Limits:
- Compilation errors: 3 retries
- Test failures: 3 retries (with fixes)
- Missing deps: 2 retries
- Merge conflicts: escalate to user
- Security issues: 1 attempt, then escalate
Logging Standards
Every skill execution should log:
## Skill Execution Log
**Skill:** [name]
**Started:** [timestamp]
**Ended:** [timestamp]
**Duration:** [X] minutes
**Actions Taken:**
- [List of major actions]
**Files Created:**
- [List]
**Files Modified:**
- [List]
**Decisions Made:**
- [List with rationale]
**Blockers Encountered:**
- [List]
**Quality Score:** [X]/100
**Passed Quality Gate:** [Yes/No]
**Handoff Notes:**
- [Any context needed for next session]
Metrics Collection
Track these metrics per pipeline execution:
{
"session_id": "uuid",
"timestamp": "ISO8601",
"mode": "full-build|feature|...",
"engagement": "express|standard|thorough|meticulous",
"execution": "sequential|parallel",
"duration_minutes": 0,
"skills_invoked": ["skill1", "skill2"],
"tasks_completed": 0,
"tasks_total": 0,
"quality_scores": {
"build": 0,
"harden": 0,
"overall": 0
},
"gates_approved": 0,
"gates_rejected": 0,
"errors_encountered": 0,
"retry_count": 0,
"user_approvals": 0
}
Performance Benchmarks
| Metric | Target | Warning | Critical |
|---|
| Context utilization | < 70% | 70-80% | > 80% |
| Task duration | < 30 min | 30-60 min | > 60 min |
| Error rate | < 5% | 5-15% | > 15% |
| Retry rate | < 10% | 10-20% | > 20% |
| Quality score | > 90 | 80-90 | < 80 |
Dependency Injection Pattern
For skills that need shared services:
interface ServiceContainer {
logger: LoggerService;
memory: MemoryService;
config: ConfigService;
metrics: MetricsService;
}
class SoftwareEngineerSkill {
constructor(private services: ServiceContainer) {}
execute(context: SkillContext): SkillResult {
this.services.logger.info('Starting software engineer skill');
}
}
Configuration Schema
.production-grade.yaml full schema:
project:
name: "My Project"
version: "0.1.0"
description: "Project description"
features:
frontend: true
mobile: false
ai_ml: false
skip_define_ba: false
paths:
backend: "services"
frontend: "frontend"
tests: "tests"
docs: "docs"
infrastructure: "infrastructure"
quality:
block_score: 60
minimum_score: 90
excellent_score: 95
coverage_threshold: 80
pipeline:
engagement: "standard"
execution: "parallel"
max_workers: 4
review:
mode: "lean"
auto_review: true
codingLevel: 8
brownfield:
protected_paths:
- "config/production/*"
- "scripts/deploy.sh"
baseline_branch: "main"
game:
engine: "unity"
platform: "web"
target_fps: 60
mobile_fps: 30
ai:
model: "gpt-4"
temperature: 0.7
max_tokens: 4000
Environment Variables
| Variable | Description | Default |
|---|
FORGEWRIGHT_WORKSPACE | Project workspace path | Current directory |
FORGEWRIGHT_SKIP_MEMORY | Skip memory initialization | 0 |
FORGEWRIGHT_LOCAL_MEMORY | Use local memory | 1 |
FORGEWRIGHT_DEBUG | Enable debug logging | 0 |
FORGEWRIGHT_MAX_RETRIES | Max retry attempts | 3 |
FORGEWRIGHT_TIMEOUT | Skill timeout (seconds) | 600 |
Emergency Procedures
When pipeline encounters critical failure:
- Assess scope: Isolate the failure point
- Preserve state: Save all progress to handoff document
- Evaluate options:
- Retry with fixes
- Skip failed task
- Abort and escalate
- Communicate: Report to user with options
- Decide: User selects course of action
Escalation criteria:
- Security vulnerability discovered
- Data corruption risk
- Budget/time overrun > 50%
- Unresolvable blocker after 3 attempts
Cross-Skill Communication Protocol
Skills communicate through structured artifacts:
┌─────────────────────────────────────────────────────────────────────┐
│ ARTIFACT CONTRACT │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Each skill writes artifacts to: │
│ .forgewright/<skill-name>/<artifact-name>.json │
│ │
│ Artifact structure: │
│ { │
│ "version": "1.0", │
│ "skill": "skill-name", │
│ "timestamp": "ISO8601", │
│ "data": { ... skill-specific data ... } │
│ } │
│ │
└─────────────────────────────────────────────────────────────────────┘
Standard artifacts:
| Artifact | From | To | Content |
|---|
brd.json | PM | Architect, BE, FE | User stories, acceptance criteria |
architecture.json | Architect | BE, FE, DevOps | Services, API contracts, data models |
api-contracts.json | Architect | BE, FE | Endpoint definitions, request/response schemas |
test-plan.json | QA | QA | Test cases, coverage targets |
security-report.json | Security | Security | Vulnerabilities, severity, recommendations |
quality-report.json | Review | Review | Code quality findings, patterns |
delivery.json | Any skill | Orchestrator | Task completion status |
Skill Invocation Patterns
Sequential pattern (skills run one after another):
Skill A → Artifact A → Skill B → Artifact B → Skill C
Parallel pattern (skills run simultaneously):
┌─────────────┐
│ Artifact A │
└─────────────┘
│
┌───┴───┐
▼ ▼
┌───────┐ ┌───────┐
│Skill A│ │Skill B│
└───┬───┘ └───┬───┘
│ │
▼ ▼
┌───────┐ ┌───────┐
│Artifact│ │Artifact│
│ A │ │ B │
└───┬───┘ └───┬───┘
│ │
└────┬────┘
▼
┌────────┐
│Merge │
│Arbiter │
└────────┘
Sequential with feedback:
Skill A → Artifact A → Skill B → Test B → [Fail] → Skill B fix → Artifact B updated
↓
[Pass]
↓
Skill C
Skill Health Monitoring
Track skill performance over time:
{
"skill_health": {
"software-engineer": {
"invocations": 15,
"avg_duration_minutes": 25,
"success_rate": 0.93,
"avg_quality_score": 88,
"last_failure": {
"timestamp": "2026-05-20",
"reason": "Timeout on large service",
"resolution": "Increased timeout, split service"
}
}
}
}
Health thresholds:
- Success rate < 80%: Investigate skill
- Avg quality < 70%: Update skill guidance
- Avg duration > 60 min: Optimize skill
Test Pyramid Implementation
▲
/█\ E2E: 5-10 tests
/ █ \ - Critical user flows
/ █ \ - Login, purchase, core loop
/────█────\
/ █ \ Integration: 15-20 tests
/ █ \ - Service interactions
/───────█────────\ - Database operations
/ █ \ Unit: 50-100 tests
/ █ \ - Pure functions
/──────────█───────────\ - Formula calculations
Unit test coverage targets:
- Business logic: 90%
- Utility functions: 95%
- State machines: 85%
- Formatters/validators: 100%
Integration test coverage:
- API endpoints: 80%
- Database operations: 70%
- Message queues: 60%
- External services (mocked): 90%
E2E test coverage:
- Critical paths: 100%
- Happy path: 100%
- Error recovery: 50%
- Edge cases: 30%
Continuous Integration Template
name: Forgewright Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
quality-gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run unit tests
run: npm run test:unit
- name: Run integration tests
run: npm run test:integration
- name: Run e2e tests
run: npm run test:e2e
- name: Check coverage
run: npm run test:coverage
- name: Security scan
run: npm audit --audit-level=high
build:
needs: quality-gate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build
run: npm run build
- name: Docker build
run: docker build -t app:${{ github.sha }} .
deploy-staging:
needs: build
if: github.ref == 'refs/heads/develop'
runs-on: ubuntu-latest
environment: staging
steps:
- name: Deploy to staging
run: ./scripts/deploy.sh staging
deploy-production:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy to production
run: ./scripts/deploy.sh production
Deployment Checklist
Before any deployment:
Monitoring & Observability
Metrics to track:
| Category | Metrics |
|---|
| Business | DAU, MAU, retention, conversion rate, revenue |
| Performance | Response time, throughput, error rate |
| Reliability | Availability, MTTR, MTBF |
| Quality | Test coverage, bug count, tech debt |
Alert thresholds:
| Alert | Threshold | Severity |
|---|
| Error rate | > 1% | Warning |
| Error rate | > 5% | Critical |
| Response time | > 500ms p95 | Warning |
| Response time | > 2000ms p95 | Critical |
| Availability | < 99.9% | Critical |
| CPU | > 80% | Warning |
| Memory | > 90% | Critical |
Knowledge Transfer Protocol
When transitioning between sessions:
1. EXECUTIVE SUMMARY (3 sentences)
- What was the goal?
- What was accomplished?
- What remains?
2. TECHNICAL STATE
- Architecture decisions (key ones)
- Current blockers
- Next actions
3. FILE INVENTORY
- Created/modified files
- Their purposes
4. TESTING STATUS
- Tests passing/failing
- Coverage percentage
5. OPEN QUESTIONS
- Decisions pending
- Ambiguities unresolved
6. CONTEXT FOR CONTINUATION
- Exact command to resume
- Files to examine first
Skill Catalog
Complete list of 57 skills organized by category:
Orchestration & Meta:
- Orchestrator (production-grade)
- Polymath
- Parallel Dispatch
- Memory Manager
- Skill Maker
- MCP Generator
Engineering:
7. Business Analyst
8. Product Manager
9. Solution Architect
10. Software Engineer
11. Frontend Engineer
12. QA Engineer
13. Security Engineer
14. Code Reviewer
15. DevOps
16. SRE
17. Data Scientist
18. Technical Writer
19. UI Designer
20. Interaction Designer
21. Art Director
22. Vision Review
23. Mobile Engineer
24. Mobile Tester
25. API Designer
26. Database Engineer
27. Debugger
28. Prompt Engineer
29. Prompt Optimizer
30. AI Engineer
31. Accessibility Engineer
32. Performance Engineer
33. UX Researcher
34. Data Engineer
35. XLSX Engineer
36. Project Manager
Game Development:
37. Game Designer
38. Unity Engineer
39. Unreal Engineer
40. Godot Engineer
41. Godot Multiplayer
42. Roblox Engineer
43. Phaser 3 Engineer
44. Three.js Engineer
45. Level Designer
46. Narrative Designer
47. Technical Artist
48. Game Audio Engineer
49. Game Asset & VFX
50. Unity Shader Artist
51. Unity Multiplayer
52. Unreal Technical Artist
53. Unreal Multiplayer
54. XR Engineer
Growth & Marketing:
55. Growth Marketer
56. Conversion Optimizer
Workflow:
57. Goal-Driven
Session Lifecycle Hooks
Call these hooks at the appropriate lifecycle points:
| Event | Hook | Action |
|---|
| Phase completes | PHASE_COMPLETE(name, summary) | Update session-log, save to memory, update quality metrics |
| Task completes | TASK_COMPLETE(id, name, status, summary) | Update session-log |
| Gate decided | GATE_DECISION(gate#, decision, feedback) | Update session-log, save decision to memory |
| Architecture approved | ARCH_DECISION(tech_stack, services, rationale) | Save architecture to memory — see Gate 2.5 |
| Error occurs | ERROR(task_id, type, details) | Update session-log, save blocker to memory |
| Pipeline ends | Session End | Summarize, save to memory, update project profile |
| User request answered | TURN_CLOSE | Mandatory memory add — see session-lifecycle §Per-request memory |
User Experience Protocol
Follow the shared UX Protocol at skills/_shared/protocols/ux-protocol.md. Key rules:
- Don't ask open-ended questions — always use notify_user with predefined numbered options (open-ended questions stall the pipeline because the model can't proceed without parsing free-text responses)
- "Chat about this" always last option
- Recommended option first with
(Recommended) suffix
- Continuous execution — work until next gate or completion
- Real-time progress — constant ⧖/✓ progress updates via task_boundary
- Autonomy — sensible defaults, self-resolve, report decisions
Gate Companion — Polymath Integration
When the user selects "Chat about this" at any gate, invoke the polymath in translate mode:
Read skills/polymath/SKILL.md and follow its instructions in translate mode.
The polymath reads the gate artifacts, explains in plain language,
answers the user's questions via structured options,
then re-presents the original gate options when the user is ready.
This ensures non-technical users can understand what they're approving without the orchestrator needing to be the translator.
Review Mode Integration
At each gate, adapt behavior based on production/review-mode.txt:
| Mode | Gate Behavior |
|---|
| Full | Run director reviews, show detailed findings, longer approval flow |
| Lean | Quick validation, abbreviated findings, streamlined approval |
| Solo | Skip gate pause, auto-proceed with quality gate score only |
REVIEW_MODE=$(cat production/review-mode.txt 2>/dev/null || echo "lean")
if [ "$REVIEW_MODE" = "solo" ]; then
# Skip gate pause, log quality score
Log: "Quality Gate Score: [X]/100 — Auto-proceeding (Solo mode)"
else
# Show gate options as normal
fi
Strategic Gates (4 total — 3 user-facing + 1 automated)
Gate 1 — BRD Approval (after T1):
Notify user via notify_user:
BRD complete: [X] user stories, [Y] acceptance criteria. Approve?
1. **Approve — start architecture (Recommended)** — BRD locked, proceed to Solution Architect
2. **Show BRD details** — Display the full BRD before deciding
3. **I have changes** — Request modifications to requirements
4. **Chat about this** — Free-form input about the BRD
Gate 2 — Architecture Approval (after T2):
Notify user via notify_user:
Architecture complete: [tech stack summary]. Approve to start building?
1. **Approve — start building (Recommended)** — Architecture locked, begin autonomous BUILD phase
2. **Show architecture details** — Walk through ADRs, diagrams, and API spec
3. **I have concerns** — Flag issues with architecture decisions
4. **Chat about this** — Free-form input about the architecture
Gate 2.5 — Architecture Memory Persistence (auto, no user interaction):
After Gate 2 is approved, automatically persist architecture decisions to memory:
1. Extract key architecture decisions:
- Tech stack (language, framework, key libraries)
- Service decomposition (services, modules)
- API style (REST, GraphQL, etc.)
- Database choices
- Key architectural patterns
2. Run memory persistence commands:
# Main architecture
python3 scripts/mem0-v2.py add "ARCH: [tech stack] | SERVICES: [service list] | REASON: [key rationale]" --category architecture
# Individual ADRs
python3 scripts/mem0-v2.py add "DECISION: [ADR title] | ALTERNATIVE: [rejected options] | REASON: [why chosen]" --category decisions
# Project scope
python3 scripts/mem0-v2.py add "PROJECT: [project name] | SCOPE: [feature list] | STATUS: active" --category project
3. Log: "✓ Architecture decisions persisted to memory — [N] decisions saved"
Why this matters: Future sessions can search mem0-v2.py search "architecture" to retrieve the approved stack without re-reading all architecture files.
Gate 3 — Production Readiness (after T9):
Read review mode first:
REVIEW_MODE=$(cat production/review-mode.txt 2>/dev/null || echo "lean")
Solo mode: Auto-proceed with quality gate score:
if [ "$REVIEW_MODE" = "solo" ]; then
Log: "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
Log: "Phase 5 — SUSTAIN Complete [Review: Solo]"
Log: "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
Log: "Quality Gate Score: [X]/100"
Log: "All phases complete — auto-proceeding (Solo mode)"
Log: "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
# Skip to final summary
fi
Step G3.1 — Run VERIFIER subagent (before showing Gate 3 to user):
Before presenting Gate 3 options to the user, run the Cursor verifier subagent to confirm all work is actually complete:
Invoke: /verifier Confirm all pipeline deliverables are complete and functional for [project-name]
The verifier subagent:
- Reads
.forgewright/subagent-context/PIPELINE_SUMMARY.md for scope
- Reads all DELIVERY.json from completed tasks
- Runs compilation and tests for each deliverable
- Scans for TODOs, secrets, and obvious bugs
- Writes report to
.forgewright/subagent-context/VERIFIER_REPORT.md
Step G3.2 — Present Gate 3 options (using verifier report):
Notify user via notify_user (with verifier report summary):
All phases complete. Ship it?
## Verifier Report Summary
[VERIFIER_REPORT.md summary — PASS/FAIL count]
1. **Ship it — production ready (Recommended)** — Verifier confirmed ✓
2. **Show full report** — Display complete pipeline summary + verifier details
3. **Fix issues first** — Address remaining findings before shipping
4. **Chat about this** — Free-form input about production readiness
If verifier returned FAIL or PARTIAL:
⚠️ Verifier found issues. Review before shipping.
## Verifier Report
[FAIL/PARTIAL findings from VERIFIER_REPORT.md]
1. **Fix and retry verifier** — Address issues, re-run /verifier
2. **Show full report** — See all findings in detail
3. **Override — ship anyway** — Proceed with known issues (not recommended)
4. **Chat about this** — Discuss the findings
Task Dependency Graph
Task execution with clear dependency tracking. The orchestrator reads the architecture output (number of services, pages, modules) and generates tasks accordingly. Supports both sequential and parallel execution based on settings.md.
Sequential Mode (default)
T1: product-manager (BRD)
↓ [GATE 1]
T2: solution-architect (Architecture)
↓ [GATE 2]
T3a: software-engineer — implement backend services (1 per service)
T3b: frontend-engineer — implement frontend pages (1 per page group)
T4a: devops — Dockerfiles + CI skeleton
↓ (code written)
T5: qa-engineer — implement tests (unit/integ/e2e/perf)
T6a: security-engineer — STRIDE + code audit + dep scan
T6b: code-reviewer — arch conformance + quality review
↓
T7: devops (IaC + CI/CD)
T8: remediation (HARDEN fixes)
T9: sre (SLOs + chaos + capacity)
T10: data-scientist (conditional on AI/ML)
↓ [GATE 3]
T11: technical-writer (API ref + dev guides)
T12: skill-maker
↓
T13: Compound Learning + Assembly
Parallel Mode
T1: product-manager (BRD)
↓ [GATE 1]
T2: solution-architect (Architecture)
↓ [GATE 2]
┌────────────────────── Parallel Group A (BUILD) ─────────────────┐
│ T3a: software-engineer ──── worktree: .worktrees/T3a │
│ T3b: frontend-engineer ──── worktree: .worktrees/T3b │
│ T3c: mobile-engineer ──── worktree: .worktrees/T3c [cond.] │
└────────────────── validate → merge → integration test ─────────┘
T4a: devops (depends on merged T3a output)
↓ (code written)
┌────────────────────── Parallel Group B (HARDEN) ────────────────┐
│ T5: qa-engineer ──── worktree: .worktrees/T5 │
│ T6a: security-engineer ──── worktree: .worktrees/T6a │
│ T6b: code-reviewer ──── worktree: .worktrees/T6b │
└────────────────── validate → merge → integration test ─────────┘
↓
T7: devops (IaC + CI/CD)
T8: remediation (HARDEN fixes)
T9: sre (SLOs + chaos + capacity)
T10: data-scientist (conditional on AI/ML)
↓ [GATE 3]
T11: technical-writer (API ref + dev guides)
T12: skill-maker
↓
T13: Compound Learning + Assembly
When parallel mode is active, the orchestrator reads skills/parallel-dispatch/SKILL.md for the dispatch flow.
Task Dependencies
| Task | Blocked By | Notes |
|---|
| T1 | — | First task, no blockers |
| T2 | T1 | Needs BRD |
| T3a | T2 | Backend — implement services from architecture |
| T3b | T2 | Frontend — implement pages from BRD |
| T4a | T2 | DevOps — Dockerfiles + CI skeleton |
| T5 | T3a, T3b | QA — needs code + test plan |
| T6a | T3a, T3b | Security — needs code + threat model |
| T6b | T3a, T3b | Review — needs code + checklist |
| T7 | T5, T6a, T6b | IaC + CI/CD — needs HARDEN output |
| T8 | T5, T6a, T6b | Remediation — needs HARDEN findings |
| T9 | T7, T8 | SRE — needs infra + fixes |
| T10 | T7, T8 | Conditional on AI/ML usage |
| T11 | T9 | Docs — needs all prior output |
| T12 | T9 | Skills — needs all prior output |
| T13 | T11, T12 | Final step |
Dynamic Task Generation
After Gate 2 (architecture approved), the orchestrator reads the architecture output to determine work units:
- Count services — Read
docs/architecture/ service list or api/ specs. For each service, note it for sequential implementation in T3a.
- Count pages — Read BRD user stories. Group into page clusters (auth, dashboard, settings, etc.). Note for T3b.
- Execute sequentially — Each service and page group is implemented one at a time, reading the SKILL.md for the relevant skill.
Conditional Tasks
- T3b (Frontend): Skip if
.production-grade.yaml has features.frontend: false
- T10 (Data Scientist): Auto-detect by scanning for
openai, anthropic, langchain, transformers, torch, tensorflow imports. If not detected and features.ai_ml: false, mark as completed immediately.
Phase Execution
Each phase loads its dispatcher file for task management. In parallel mode, BUILD and HARDEN phases additionally invoke the parallel-dispatch skill.
| Phase | File | Tasks | Parallel Support |
|---|
| DEFINE | phases/define.md | T1, T2 | No (gate-protected) |
| BUILD | phases/build.md | T3a, T3b, T3c, T4a | Yes (Group A) |
| HARDEN | phases/harden.md | T5, T6a, T6b | Yes (Group B) |
| SHIP | phases/ship.md | T7, T8, T9, T10 | |
| SUSTAIN | phases/sustain.md | T11, T12, T13 | |
Read the phase file BEFORE starting that phase. Never load all phase files at once.
Internal skill architecture — each skill's internal phase structure (executed sequentially in Antigravity):
| Skill | Internal Phases |
|---|
| software-engineer | Shared foundations first (Phase 2a), then per-service implementation (Phase 2b). Foundations ensure consistency. |
| frontend-engineer | UI Primitives first (Phase 3a), then Layout + Features (Phase 3b), then Pages (Phase 4). Primitives are foundational atoms. |
| qa-engineer | Unit, integration, e2e, performance tests — sequential by test type |
| security-engineer | Code audit, auth review, data security, supply chain — sequential by domain |
| code-reviewer | Architecture conformance, code quality, performance review — sequential by focus |
| devops | IaC, CI/CD, container orchestration — sequential by layer |
| sre | Chaos engineering, incident management, capacity planning — sequential |
| technical-writer | API reference, developer guides — sequential |
Skill Dispatch Method
Read the skill's SKILL.md file and follow its instructions directly:
Read skills/<skill-name>/SKILL.md and follow its instructions.
Provide context: architecture files, BRD, workspace paths, etc.
Conflict Resolution
Follow the shared protocol at skills/_shared/protocols/conflict-resolution.md.
| Artifact | Sole Authority | Others Must NOT |
|---|
| OWASP, STRIDE, PII, encryption | security-engineer | code-reviewer must NOT do security review |
| SLO, error budgets, runbooks | sre | devops must NOT define SLOs |
| Code quality, arch conformance | code-reviewer | — |
| Infrastructure, CI/CD, monitoring setup | devops | sre reviews but doesn't provision |
| Requirements (WHAT) | product-manager | architect flags gaps, doesn't change requirements |
| Architecture (HOW) | solution-architect | — |
Remediation Feedback Loop
When HARDEN skills find Critical/High issues:
- Orchestrator creates T8 (Remediation) task with findings
- Fix code in
services/, frontend/
- Re-scan affected files after fixes
- If still failing after 2 cycles → escalate to user via notify_user
Context Bridging
| Task | Reads From | Writes To (Project Root) | Writes To (Workspace) |
|---|
| Polymath | User dialogue, web research | — | polymath/context/, polymath/handoff/ |
| T1: PM | User input, polymath context, web research | — | product-manager/BRD/ |
| T2: Architect | product-manager/BRD/ | api/, schemas/, docs/architecture/ | solution-architect/ |
| T3a: Backend | api/, schemas/, docs/architecture/ | services/, libs/shared/ | software-engineer/ |
| T3b: Frontend | api/, product-manager/BRD/ | frontend/ | frontend-engineer/ |
| T4: DevOps | services/, docs/architecture/ | Dockerfiles at root | devops/containers/ |
| T5: QA | services/, frontend/, api/ | tests/ | qa-engineer/ |
| T6a: Security | All implementation code | — | security-engineer/ |
| T6b: Review | All implementation + architecture | — | code-reviewer/ |
| T7: DevOps IaC | Architecture, implementation | infrastructure/, .github/workflows/ | devops/ |
| T8: Remediation | HARDEN findings | Fixes in services/, frontend/ | — |
| T9: SRE | All prior outputs | docs/runbooks/ | sre/ |
| T10: Data Sci | Implementation (LLM usage) | — | data-scientist/ |
| T11: Tech Writer | ALL workspace + project | docs/ | technical-writer/ |
| T12: Skill Maker | ALL workspace | skills/ | skill-maker/ |
Deliverables go to project root (respecting .production-grade.yaml path overrides). Workspace artifacts go to .forgewright/<skill-name>/.
Workspace Architecture
.forgewright/
├── .protocols/ # Shared protocols (written at bootstrap)
├── .orchestrator/ # Pipeline state via task.md
├── product-manager/ # BRD, research
├── solution-architect/ # Architecture artifacts
├── software-engineer/ # Backend logs/artifacts
├── frontend-engineer/ # Frontend logs/artifacts
├── qa-engineer/ # Test artifacts
├── security-engineer/ # Security findings
├── code-reviewer/ # Quality findings
├── devops/ # Infrastructure artifacts
├── sre/ # Readiness artifacts
├── data-scientist/ # AI/ML artifacts (conditional)
├── technical-writer/ # Documentation artifacts
└── skill-maker/ # Custom skills
Adaptive Rules
| Situation | Action |
|---|
| No frontend needed | Skip T3b, simplify DevOps |
| Monolith architecture | Single Dockerfile, skip K8s/service mesh |
| LLM/ML APIs detected | Auto-enable T10 (Data Scientist) |
| Critical security finding | Create remediation task (T8) |
| QA failures > 20% | Flag to user |
| Architecture drift detected | Warn user (arch decisions are user-approved) |
features.frontend: false | Skip T3b entirely |
features.ai_ml: false | Skip T10 unless auto-detected |
Security Hooks (Continuous)
Security runs during ALL phases:
- Block
rm -rf /, chmod 777, destructive operations
- Block
.env, .key, .pem, credentials.json from git
- Scan staged files for API keys, tokens, passwords
- Engineers scan for hardcoded secrets as they write code
Autonomous Behavior
Every skill execution follows:
- Build and verify — after writing code, run it. After writing tests, execute them.
- Quality gate — run
skills/_shared/protocols/quality-gate.md after each skill output. Score must meet threshold.
- Validation loop —
while not valid: fix(errors); validate()
- Self-debug — read errors, identify root cause. After 3 failures: stop and report.
- Quality bar — no TODOs, no stubs. All code compiles. All tests pass. Quality score ≥ 90.
- TDD enforced — write test first, watch fail, implement, watch pass, refactor.
- Convention compliance — read
.forgewright/code-conventions.md (if brownfield) and match existing patterns.
Partial Execution
| Command | Tasks Run |
|---|
just define | T1, T2 only |
just build | T3a, T3b, T4 (requires T2 output) |
just harden | T5, T6a, T6b (requires BUILD output) |
just ship | T7-T10 (requires HARDEN output) |
just document | T11 only |
skip frontend | Omit T3b |
start from architecture | Skip T1, start at T2 |
just onboard | Run project-onboarding only (no pipeline) |
Final Summary — Quality Dashboard
At pipeline completion, generate the Quality Dashboard from skills/_shared/protocols/quality-dashboard.md. This replaces the legacy text banner with a comprehensive, machine-readable quality report.
The dashboard includes:
- Overall quality score (0-100) with grade (A-F)
- Build health — compilation, Docker, dependencies, lint
- Test coverage — unit, integration, E2E, contract, performance, regression
- Security — OWASP, STRIDE, CVEs, secrets scan
- Code quality — architecture conformance, conventions, stubs, imports
- Acceptance — BRD criteria coverage, traceability
- Pipeline stats — mode, duration, skills run, files changed
Machine-readable output: .forgewright/quality-report-{session}.json
Quality trending: .forgewright/quality-history.json (appended each session)
Also display the legacy summary for backward compatibility:
╔══════════════════════════════════════════════════════════════╗
║ Forgewright v{local_version} — COMPLETE ║
╠══════════════════════════════════════════════════════════════╣
║ Project: <name> ║
║ Quality Score: [XX]/100 (Grade [A-F]) ║
║ ║
║ DEFINE: ✓ BRD (<X> stories) ✓ Architecture (<pattern>) ║
║ BUILD: ✓ Backend (<N> services) ✓ Tests (<N> passing) ║
║ HARDEN: ✓ Security (<N> fixed) ✓ Code Review (<N> fixed) ║
║ SHIP: ✓ Docker ✓ CI/CD ✓ Terraform ✓ SRE approved ║
║ SUSTAIN: ✓ Docs ✓ Skills (<N> created) ✓ Learnings captured ║
║ ║
║ Workspace: .forgewright/ ║
║ Config: .production-grade.yaml ║
║ Report: .forgewright/quality-report-{session}.json ║
╚══════════════════════════════════════════════════════════════╝
Brownfield Safety Net
For ALL brownfield projects (any mode, not just Full Build), activate the safety net from skills/_shared/protocols/brownfield-safety.md:
| Safety Layer | When | Action |
|---|
| Git branch | Pre-pipeline | Create forgewright/session-{timestamp} branch |
| Baseline snapshot | Pre-pipeline | Run existing tests, record pass count |
| Protected paths | Pre-pipeline | Register paths that must not be modified |
| Regression checks | After T3a, T3b, T5 | Verify existing tests still pass |
| Change manifest | During pipeline | Track every file create/modify/delete |
| Merge readiness | Pre-Gate 3 | Full regression + quality check |
| Rollback | On failure | Revert via session branch |
Common Mistakes
| Mistake | Fix |
|---|
| Running BUILD without DEFINE | Architecture decisions must exist first |
| Code reviewer doing OWASP review | security-engineer is sole OWASP authority |
| DevOps defining SLOs | sre is sole SLO authority |
| DevOps writing runbooks | sre writes runbooks to docs/runbooks/ |
| Skipping tests | Production grade means tested |
| Not running code after writing | Every skill verifies output compiles and runs |
| Skills working in isolation | Cross-reference via Context Bridging table |
| Over-asking the user | Respect engagement mode. Express: 3 gates only. Standard: 3 gates + moderate interview. Thorough/Meticulous: deeper interviews but always structured options. |
| Ignoring engagement mode | ALL skills must read settings.md and adapt depth. Express architect doesn't ask 15 questions. Meticulous PM doesn't skip to BRD after 2 questions. |
| One-size-fits-all architecture | Architecture is derived from constraints (scale, team, budget, compliance). A 100-user internal tool does NOT need microservices + K8s. |
| Writing stubs | No // TODO: implement in production code |
| Hardcoded paths | Read .production-grade.yaml for path overrides |
| Not leveraging skill architecture | Even though execution is sequential, each skill's internal phase structure ensures quality. Foundations before dependent work. |
| Duplicating security review | code-reviewer references security-engineer findings |
| Skipping quality gate | EVERY skill output must pass quality-gate.md — no exceptions, even in sequential mode |
| Ignoring code conventions in brownfield | Read .forgewright/code-conventions.md BEFORE writing code. Match existing patterns. |
| Modifying protected paths | Check brownfield-safety protected paths before ANY file write |
| No regression check in brownfield | After EACH build skill, verify existing tests still pass against baseline |
| Not saving session state | Call session lifecycle hooks at every phase/task/gate completion |
Execution Learnings
Auto-generated by ASIP. DO NOT DELETE.
2026-04-24 — Architectural: Self-Improving Agentic System Design
- Problem: Needed to design ASIP protocol for adaptive skill improvement
- Failed Attempts: N/A (initial design)
- Research Source: https://notebooklm.google.com/notebook/ca68602f-fcf2-4ab9-b8e9-9743868e18b6
- Solution: ASIP design combines ACE (incremental delta updates) + Multi-Agent Reflexion (diverse perspectives) + HyperAgents (self-modification)
- Key Insight: Self-improvement should be persistent (in code files), human-readable, and transferable. Avoid context collapse by using incremental updates.
- Apply When: Designing any self-improvement loop, skill adaptation, or knowledge retention system