원클릭으로 Manus에서 모든 스킬 실행

$pwd:

evolve

Name: Evolve
Author: SethGammon

// Research-driven multi-cycle improvement director. Forms causal hypotheses about why scores are low, validates them with scout agents before attacking, dispatches axis-parallel fleet attacks, extracts transferable patterns, and runs indefinitely within a budget envelope. Accumulates a persistent belief model and pattern library across sessions.

Manus에서 실행

$ git log --oneline --stat

stars:577

forks:54

updated:2026년 5월 7일 17:12

파일 탐색기

3 개 파일

SKILL.md

readonly

related-skills.json

같은 저장소

archon.md

from "SethGammon/Citadel"

Autonomous multi-session campaign agent. Decomposes large work into phases, delegates to sub-agents, reviews output, and maintains campaign state across context windows. Use for work that spans multiple sessions and needs persistent state, quality judgment, and strategic decomposition.

2026-05-07577

autopilot.md

from "SethGammon/Citadel"

Intake-to-delivery pipeline. Processes pending items from .planning/intake/: briefs new ideas, executes approved work through research → plan → build → verify. Drop a file in .planning/intake/ and invoke this skill.

2026-05-07577

design.md

from "SethGammon/Citadel"

Generates and maintains a design manifest for visual consistency. In existing projects, reads current styles and documents the design language. In new projects, asks a few questions and generates a starter manifest. The post-edit hook reads the manifest and flags deviations.

2026-05-07577

do.md

from "SethGammon/Citadel"

Unified router that auto-routes user intent to the right orchestrator or skill. Classifies input by scope, complexity, persistence needs, and parallelism, then dispatches to the cheapest path that can handle it: direct command, skill, marshal, archon, or fleet. Single entry point for all work.

2026-05-07577

experiment.md

from "SethGammon/Citadel"

Automated optimization loop with scalar fitness function. Proposes changes in isolated worktrees, measures with a metric command, keeps improvements, discards failures. Supports convergence detection and diminishing returns.

2026-05-07577

fleet.md

from "SethGammon/Citadel"

Parallel campaign orchestrator. Runs multiple campaigns in coordinated waves within a single session. Spawns 2-3 agents per wave in isolated worktrees, collects discoveries, shares context between waves. Use when work decomposes into 3+ independent streams that can run simultaneously.

2026-05-07577

package.json

"author": "SethGammon"

"repository": "SethGammon/Citadel"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name	evolve
description	Research-driven multi-cycle improvement director. Forms causal hypotheses about why scores are low, validates them with scout agents before attacking, dispatches axis-parallel fleet attacks, extracts transferable patterns, and runs indefinitely within a budget envelope. Accumulates a persistent belief model and pattern library across sessions.
user-invocable	true
auto-trigger	false
last-updated	"2026-05-03T00:00:00.000Z"

/evolve — Improvement Director

Orientation

Use when: You want sustained autonomous quality advancement — the director forms hypotheses, scouts before attacking, and builds a belief model that compounds across cycles. Runs until a natural ceiling, budget exhaustion, or you say stop.

Don't use when: You want a single scored loop (/improve), a known axis attacked directly (/improve --axis), or a one-time audit (/improve --score-only).

Key difference from /improve: /improve follows the rubric mechanically. /evolve asks why scores are where they are, validates those theories before spending fleet budget, and extracts cross-skill patterns that propagate to skills never directly attacked.

Invocation

/evolve {target}                  # run until ceiling, velocity drop, or budget
/evolve {target} --n={N}          # exactly N director cycles then stop
/evolve {target} --budget=${X}    # run until cumulative spend reaches $X
/evolve {target} --continue       # resume from saved director state
/evolve {target} --status         # show belief model, velocity, spend — no attack
/evolve {target} --axis={name}    # focus director on one axis (scout + attack only)

target maps to .planning/rubrics/{target}.md. If no rubric exists, run /improve {target} Phase 0 first — /evolve requires an approved rubric and will not auto-generate one.

Campaign Artifacts

All findings are externalized incrementally — written after every phase, not only at cycle end. A crashed or compacted session resumes with full context.

Artifact	Path	Contents
Director state	`.planning/evolve/{target}/director-state.json`	cycle count, spend, velocity history, current phase, halt status
Belief model	`.planning/evolve/{target}/belief-model.jsonl`	one record per (axis, skill) per cycle: score, hypothesis, evidence, confidence
Experiment log	`.planning/evolve/{target}/experiment-log.jsonl`	every experiment: hypothesis → prediction → actual delta → mechanism confirmed
Pattern library	`.planning/evolve/{target}/pattern-library.md`	transferable patterns: what change to what axis class caused what delta in which skills
Cycle digest	`.planning/evolve/{target}/cycle-{n}-digest.md`	human-readable per-cycle summary for review
Global patterns	`.planning/research/patterns.md`	cross-target patterns written outside campaign scope; available to future sessions and other targets
Knowledge wiki	`.planning/wiki/`	compiled wiki pages from `/learn`; integrates evolve discoveries across sessions

Create .planning/evolve/{target}/ on first invocation. Create .planning/research/ if absent.

Cycle Digest Format

# Cycle {n} — {target} | {date}

## Scores
| Axis | Prior | This Cycle | Delta |

## Hypotheses
| ID | Axis | Hypothesis | Scout Result | Confidence |

## What Was Attacked
| Axis | Skill | Delta | Mechanism Confirmed |

## Patterns Discovered This Cycle
- {pattern}: {evidence}

## Belief Model Updates
- {hypothesis confirmed / rejected / revised}

## Spend: ${cycle} this cycle | ${cumulative} cumulative | Velocity: {v}

Director Cycle Protocol

Phase 1: Survey

Run /improve {target} --score-only. Record scores to belief model with delta from prior cycle (empty on cycle 1). Flag any axis that dropped since last cycle as regression-watch — these are checked first in Phase 2.

Phase 2: Hypothesize

For every axis below 8.0, generate one primary hypothesis in this form:

HYPOTHESIS: {axis} scores {n}/10 because {specific mechanism},
            not because {common misread}.
PREDICTION: Fixing {mechanism} will raise score ≥ {delta} across {N} skills.
FALSIFICATION: If we apply {change} and score does not rise > 0.5, hypothesis rejected.

Draw hypotheses from: evaluator justifications in Phase 1, prior evidence in the belief model, and programmatic check failures. Do not hypothesize from score alone — the number is the symptom.

Write each hypothesis to the experiment log as { id, status: "pending", ... }.

Skip hypothesis generation for an axis if the belief model already has a confidence >= 0.8 confirmed hypothesis for it that has not yet been attacked.

Phase 3: Scout

For axes below 7.0, or axes with unconfirmed hypotheses: dispatch one scout agent per hypothesis. Scouts read — they do not modify files.

Each scout returns:

{ "hypothesis_id": "...", "confirmed": true, "evidence": "...", "confidence": 0.85 }

Scout confidence protocol: Scouts read relevant files only — no edits, no test runs. Assign confidence:

0.9+: mechanism is directly observable (explicit absence, missing section, wrong value in file)
0.7–0.89: strong indirect evidence from 2+ corroborating observations
0.4–0.69: single observation that supports the hypothesis; alternative explanations plausible
< 0.4: no direct evidence found; hypothesis is speculative from this file set

Run scouts in parallel. Update experiment log:

confidence >= 0.7 → confirmed
confidence 0.4–0.69 → needs-evidence (do not attack; add to next cycle)
confidence < 0.4 → rejected

Skip Phase 3 for any hypothesis already confirmed at confidence >= 0.8 in the belief model from a prior cycle.

Phase 4: Prioritize

For each confirmed hypothesis compute:

EV = (delta_estimate × axis_weight × confidence) / (effort_tier × collision_multiplier)

effort_tier: low=1.0, medium=1.5, high=2.5
collision_multiplier: 2.0 if axis shares primary files with another attack in this cycle

Select top K axes where K = min(confirmed count, 4). Document selection rationale in cycle digest. If --axis was set, skip ranking — attack only that axis.

Phase 5: Fleet Attack

Dispatch one agent per selected axis in an isolated worktree (Agent tool, isolation: "worktree"). Each agent receives:

The confirmed hypothesis and its falsification criterion
The specific files to modify
Verification oracle: node scripts/run-with-timeout.js 300 node scripts/test-all.js

Each agent returns a structured result:

{
  "axis": "...", "skill": "...",
  "delta": 1.2,
  "mechanism_confirmed": true,
  "files_changed": ["..."],
  "approach": "..."
}

Merge rules:

Non-conflicting worktrees: merge all
Conflicting worktrees (same file): keep higher delta, discard lower
Regression on any previously passing programmatic check: abort that worktree, do not merge
mechanism_confirmed: false (score improved but not via predicted mechanism): record as incidental_improvement, mark hypothesis as needs-revision

Commit each merged worktree with a message citing the hypothesis ID.

Phase 6: Synthesize

For each result:

Update belief model — append evidence record for (axis, skill)
Update experiment log — mark verified / refuted / incidental
Identify transferable patterns:

PATTERN: {axis_class} | Mechanism: {what caused improvement} | Delta: {avg} across {N} instances | Applies to: {skill list} | Confidence: high/medium/low

Write patterns to .planning/evolve/{target}/pattern-library.md.

Compile into wiki: After writing to the pattern library, call /learn --from-evolve {target} --cycle {n}. This compiles cycle discoveries into .planning/wiki/ — integrating with findings from prior cycles and campaigns rather than siloing them in the evolve directory. Skip if /learn is not available in this session (log the skip, do not block the cycle).

Phase 7: Cross-Pollinate

For each confidence: high pattern, or any pattern confirmed in 2+ skills: apply to all other applicable skills as targeted single-file edits — without running a full attack cycle.

Run verification oracle per cross-pollinated skill. Commit only if all programmatic checks pass and no axis drops > 0.3. Revert on regression; mark pattern as context-dependent.

Write patterns that apply beyond this target to .planning/research/patterns.md.

Phase 8: Loop or Halt

Compute learning velocity:

velocity = Σ(delta across all attacked axes this cycle) / axes_attacked

Append to director-state.json velocity history.

Halt conditions (check in order):

--n cycles completed
--budget reached (cumulative cost ≥ limit)
All axes ≥ 9.0 across all scored skills
velocity < 0.2 for 3 consecutive cycles AND no needs-evidence hypotheses remain
Level-up triggered (see below)
User says stop

On velocity drop, before halting: attempt one axis-class switch — attack the highest-EV axis from a category not touched in the last 2 cycles. If velocity is still < 0.2 after that cycle, halt.

On level-up trigger (no axis improved > 0.5 for 2 loops, ≥ 3 loops run, no programmatic failures): write level-up proposals to .planning/rubrics/{target}-proposals.md, set status: level-up-pending in director state, halt. The campaign resumes only after the human approves and edits the live rubric.

On normal loop: increment cycle, compress prior cycle findings to continuation context, return to Phase 1.

Unlimited Mode

No --n and no --budget = unlimited. Declare before starting:

/evolve running in unlimited mode.
Target: {target} | Exit: all axes ≥ 9.0 OR velocity < 0.2 for 3 cycles
Estimated cost: $12–18/cycle | Spend so far: $0
To halt after current cycle: type /stop or press Escape.

Every cycle, report:

Cycle {n} complete. Spend: ${cycle} | Cumulative: ${total} | Velocity: {v}

When context approaches compression territory (session duration > 30 min or /compact recommended): write continuation checkpoint to director state, surface the --continue command. The next session picks up exactly where this one stopped.

For overnight / unattended runs: combine with /daemon. The director is daemon-compatible — daemon calls /evolve {target} --continue each session. Set --budget to cap total spend.

Fringe Cases

.planning/ does not exist: error — run /do setup first to initialize the harness state directory, then retry.
No rubric: error — run /improve {target} Phase 0 first. List available targets in .planning/rubrics/ as hint.
No prior scores in belief model: proceed from cycle 1; all deltas empty on first survey. Expected.
All scouts return needs-evidence: attack the top-EV axis anyway under low-confidence flag; record as exploratory. Mark result regardless.
Scout agent hangs or times out (dispatched scout never returns): After 10 minutes without a response, log the scout as status: timed-out in the experiment log with confidence: 0. Proceed with the remaining returned scouts. Never let a hung scout block the cycle — if all scouts time out, treat as "all scouts return needs-evidence" and attack the top-EV axis under low-confidence flag.
All axes collide (every axis shares files): serialize top 2 axes; parallelize remainder. Log collision.
Cross-pollination causes regression: revert that skill, mark pattern context-dependent, do not propagate further.
Level-up mid-campaign: pause, write proposals, set level-up-pending. /evolve --continue after human approval resumes cycle numbering from where it stopped.
Budget overrun risk: if projected spend for current cycle would exceed --budget by > 20%, warn and confirm before dispatching fleet.
--continue with no director state: error — no campaign to resume. Suggest /evolve {target} to start fresh.
Pattern library > 50 entries: consolidate — group by axis class, merge similar patterns, keep highest-confidence instance of each class. Log consolidation.
Zero skills match target rubric: error with message listing all .planning/rubrics/*.md targets.

Quality Gates

Every hypothesis must have an explicit falsification criterion before Phase 3
Scouts must run before fleet dispatch on any unconfirmed hypothesis
Belief model written after every phase, not only end of cycle
Cross-pollination requires passing verification oracle before commit
Regression on any previously-passing axis aborts that worktree commit
Pattern library and global patterns updated at every cycle end, even zero-improvement cycles
Cycle digest written even on abort or no-change cycles

Contextual Gates

Disclosure:

State mode (unlimited / fixed / budget) and exit conditions before first cycle
Estimate $12–18 per full cycle before starting
Report spend and velocity at end of every cycle
Confirm before continuing if cumulative spend exceeds $50

Reversibility: Red. Cross-pollination modifies many files across the repo; level-up rewrites rubric anchors permanently. Each commit is individually revertable; high volume. Range: git revert {first}^..{last}.

Trust gates:

Novice (0-4 sessions): --status and --n=1 only; unlimited blocked
Familiar (5-19): up to --n=5; unlimited requires explicit --budget cap
Trusted (20+): no cap; confirm if projected total > $100

Exit Protocol

---HANDOFF---
- Target: {target} | Cycles: {n} | Spend: ${total} | Mode: {unlimited/n/budget}
- Axes improved: {list with deltas}
- Belief model: .planning/evolve/{target}/belief-model.jsonl ({N} confirmed, {M} rejected)
- Pattern library: .planning/evolve/{target}/pattern-library.md ({N} patterns)
- Global patterns: .planning/research/patterns.md
- Knowledge wiki: .planning/wiki/index.md (compiled via /learn --from-evolve after each cycle)
- Cycle digests: .planning/evolve/{target}/cycle-*-digest.md
- Halt reason: {ceiling/velocity/budget/n-complete/user-stop/level-up-pending}
- Level-up proposals: {path or N/A}
- Reversibility: red — {N} commits across {M} files; revert range: git revert {range}
- Recommended next: {level-up and re-run / new target / done}
---

evolve

이 저장소의 다른 Skills

이 저장소의 다른 Skills

/evolve — Improvement Director

Orientation

Invocation

Campaign Artifacts

Cycle Digest Format

Director Cycle Protocol

Phase 1: Survey

Phase 2: Hypothesize

Phase 3: Scout

Phase 4: Prioritize

Phase 5: Fleet Attack

Phase 6: Synthesize

Phase 7: Cross-Pollinate

Phase 8: Loop or Halt

Unlimited Mode

Fringe Cases

Quality Gates

Contextual Gates

Exit Protocol

/evolve — Improvement Director

Orientation

Invocation

Campaign Artifacts

Cycle Digest Format

Director Cycle Protocol

Phase 1: Survey

Phase 2: Hypothesize

Phase 3: Scout

Phase 4: Prioritize

Phase 5: Fleet Attack

Phase 6: Synthesize

Phase 7: Cross-Pollinate

Phase 8: Loop or Halt

Unlimited Mode

Fringe Cases

Quality Gates

Contextual Gates

Exit Protocol