Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

crystallize

Use when the user says "crystallize", "compact", "compile", "distill", or asks to reduce a skill's reliance on LLM prose reasoning. Phase-transitions deterministic operations out of skill docs into real scripts with real test data. Rewrites the SKILL.md to be shorter and call the scripts instead of describing the logic. Aggressive about the boundary: LLM does judgment, prose, ambiguity resolution. Scripts do math, time comparison, parsing, counting, aggregation, schema validation.

Exécuter dans Manus

Étoiles1

Forks0

Mis à jour10 avril 2026 à 00:18

Source

flavordrake

flavordrake/mobissh

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

Explorateur de fichiers

20 fichiers

SKILL.md

readonly

Plus depuis ce dépôt

même dépôt

delegate

flavordrake/mobissh

Use when the user says "delegate", "assign bot work", "dispatch issues", "triage open issues", "send to bot", or explicitly "/delegate". Scans open GitHub issues, classifies which are bot-delegatable, enriches them with design direction and context, and assigns via @claude comments. Analyzes prior bot failures and decomposes large issues.

2026-06-091

pwa-device-testing

flavordrake/mobissh

This skill should be used when the user asks to "test on device", "test on emulator", "run emulator", "launch AVD", "test PWA", "test on Android", "test on mobile", "verify on real device", "check on phone", or discusses testing a feature on an actual device or emulator rather than headless Playwright. Also use when validating features that headless browsers cannot cover (biometric, PWA install, Chrome autofill, touch gestures, password managers). Use proactively when a feature has been implemented that touches any of these capabilities.

2026-04-101

agent-trace

flavordrake/mobissh

Protocol for capturing a complete development arc (TRACE). Includes telemetry, strategy pivots, logs, and performance metadata. Use this to record the "how and why" of an objective, especially for performance-critical or non-deterministic tasks.

2026-03-221

cycle

flavordrake/mobissh

Use when the user says "cycle", "pump", "sdlc", "next cycle", or "/cycle". Runs one full SDLC cycle — discover, prioritize, cluster, delegate, develop, gate — filtered by theme.

2026-03-211

develop

flavordrake/mobissh

Use when the user says "develop", "work on issue", "implement issue", "fix issue N", or explicitly "/develop N". Spawns a local develop agent in a worktree to implement one or more GitHub issues. Supports batch mode ("/develop 3,9,16") with max 4 parallel agents. Without arguments, auto-proposes bot-labeled issues ranked by risk and relevance.

2026-03-211

integrate

flavordrake/mobissh

Use when the user says "integrate", "review bot PRs", "merge bot fixes", "check bot work", "triage PRs", or explicitly "/integrate". Reviews Claude bot PRs, validates them with available test infrastructure, and merges or rejects.

2026-03-211

name

crystallize

description

Crystallize

A skill is a recipe the LLM follows. Parts of that recipe involve real judgment — UX tradeoffs, novel problem-solving, code quality, writing clear prose for humans. Those parts must stay in LLM hands.

Other parts don't. Counting things, comparing timestamps, parsing structured data, computing pass rates, sorting by number, aggregating across files, validating schemas — these are deterministic operations. An LLM asked to do them will be wrong sometimes, and the error mode is usually silent.

crystallize finds those operations in a SKILL.md and phase-transitions them out: the prose instruction becomes a real script with a real test suite driven by real TRACE data, and the SKILL.md gets rewritten to invoke the script. Net result: the skill is shorter, faster, and correct-by-construction on the deterministic parts.

The boundary

╭────────────────────────────────────────╮
│  LLM job (probabilistic, judgment)     │
├────────────────────────────────────────┤
│  • Intent disambiguation                │
│  • Code quality review                  │
│  • UX tradeoff decisions                │
│  • Novel problem-solving                │
│  • Writing human-facing prose           │
│  • Deciding WHICH script to call        │
│  • Interpreting script output           │
├────────────────────────────────────────┤
│  Script job (deterministic, scripted)  │
├────────────────────────────────────────┤
│  • Arithmetic / math / statistics       │
│  • Timestamp comparison, duration calc  │
│  • Counting files, lines, commits       │
│  • Sorting, ordering, ranking           │
│  • Parsing JSON/YAML/TOML/INI           │
│  • Regex on structured formats          │
│  • Aggregation and grouping             │
│  • Schema validation                    │
│  • Version/semver/hash comparison       │
│  • Structural diffs                     │
│  • Pattern matching (grep/glob)         │
│  • Cascading layered probes             │
╰────────────────────────────────────────╯

Every sentence in a SKILL.md that asks the LLM to do something in the bottom box is a crystallization candidate.

When to use

A skill is getting long (>200 lines) and you suspect much of it is prose re-explaining computation the agent could just execute.
A skill is producing inconsistent results across runs when given the same input — a sign the LLM is winging the deterministic parts.
You just wrote a new skill and want a compaction pass before shipping.
TRACE data from the skill shows the agent spending tokens narrating counts and comparisons instead of producing outputs.
You explicitly ask: "crystallize", "compact", "compile", "distill".

Inputs

Required:

target: a skill name or glob (integrate, *, delegate,develop)

Optional:

trace-source: directory of TRACE files to mine for real-world test data (default: .traces/). The skill pulls actual inputs the target skill saw in past runs and uses them as fixtures for the generated script tests.
dry-run: don't write any files, just print the audit report.

Phases

Phase 1: Audit

Walk the target SKILL.md file(s) line by line. Classify each instruction into one of three buckets:

A — already scripted: the instruction invokes a shell script, wrapper, or tool (e.g. scripts/gh-ops.sh, scripts/trace-init.sh). Cite the file:line. These are the reference examples and don't need changes unless the script itself is broken.
B — should be scripted: the instruction describes a deterministic operation in prose. Capture the operation's intent and inputs. Example signal phrases: "count the", "sort by", "within the last N", "parse the", "extract all", "compute the average", "compare".
C — must stay probabilistic: the instruction asks for judgment, prose writing, or novel reasoning. One-line "why" each so future passes don't re-try to crystallize them.

Output this classification as a table per skill. This IS the audit report — if dry-run, the skill stops here.

Phase 2: Discover existing deterministic tools

Runs BEFORE fixture mining and BEFORE drafting any new script. The highest-leverage finds are tools that already exist in the repo but no skill knows about them — work that was done once, used once, never advertised. Crystallizing those is free: zero new code, just wiring.

Walk every directory that could hold a usable tool:

scripts/ — top-level intent-named scripts. Note the CLI shape from each header comment. Cross-reference: which scripts are invoked by some SKILL.md? Which are not?
scripts/lib/ — sourced helpers. Note any function with a clear single-purpose contract that could be wrapped in a CLI script.
tools/ — non-scripts/ utility programs (review server, JSON parsers, frame extractors). Same audit.
.traces/*/artifacts/ — one-off scripts written inside a TRACE arc while solving a problem. These are the highest-value finds: someone built a deterministic tool to solve a real problem, used it once, and never promoted it. Promote it to scripts/ if it has reuse.
TRACE logs/ and strategy/pivot_*.md — search for shell snippets the agent ran during the arc that should have been scripts. If a pivot says "we discovered we needed to compute X, so we ran ", that one-liner is a candidate to formalize.

For every tool found, classify:

invoked — at least one current SKILL.md or script calls it. Note which one(s).
orphaned — nothing currently calls it, but the contract is clear and it solves a real problem. High-value crystallization candidate.
half-built — started in a TRACE arc, abandoned or scoped to that arc only. Either promote or delete (don't leave undead).
dead — contract unclear, no real use case, or duplicates a better tool. Mark for removal.

The output of this phase is a tool registry at .claude/tool-registry.md (or appended to it on subsequent runs). The registry is durable across crystallize passes and discoverable by every other skill — it's the catalog of "deterministic tools available in this repo, what they do, what they take, what they return".

Then cross-reference the tool registry with the B-bucket from phase 1: for each operation that should be scripted, is there ALREADY a tool that does it? If yes, skip the draft step entirely — go to phase 5 and wire it up. This is the cheap-win path.

Rationale: rebuilding a tool that already exists is the worst possible crystallize outcome. Worse than rewriting a SKILL.md to be longer, worse than missing a B-bucket entry. The user's instruction was explicit: prioritize finding tools that already emerged during problem-solving, even if no instruction asks for them. Those are the proof-of-need finds and they're already paid for.

Phase 3: Mine TRACE data for fixtures

Only runs for B-bucket operations that phase 2 did not find an existing tool for. This is the new-script path.

For each remaining B-bucket operation, walk trace-source and find TRACEs where the target skill was actually invoked. Extract the inputs the skill received (from specs/, logs/, artifacts/, TRACE.md decisions section). These become the fixtures for the script's test suite.

If no TRACE data exists for the operation, generate a minimal synthetic fixture that covers the known edge cases (empty input, single item, ordering, duplicates). Always mark synthetic fixtures clearly so a later pass can replace them with real data.

Rationale: the whole point is to reduce LLM winging. Handcrafted fixtures that happen to cover the happy path don't prove correctness — real-world TRACE data does.

Phase 4: Extract / draft script

For each B-bucket operation NOT covered by an existing tool from phase 2:

Draft scripts/<verb>-<noun>.sh (or .py / .js if the operation needs structured-data libraries bash lacks). Use the existing scripts in this repo as the style reference — intent-named, set -euo pipefail, MOBISSH_TMPDIR convention, explicit flags.
The script's CLI must be stable and documented in its header. The SKILL.md will call it by name, so breaking the CLI breaks every dependent skill.
Add a test file adjacent to the script. Unit-test style: each known fixture from phase 3 → expected output. The test should fail loudly if the script's behavior drifts.
Register the new script in .claude/tool-registry.md so the next crystallize pass on a different skill finds it in phase 2.

Keep scripts single-purpose. Resist "while I'm here, let me also handle X". One script per operation. Composition happens in the SKILL.md.

Phase 5: Verify

Run the drafted script against every fixture. It must produce the same output every run. If the output depends on wall-clock time, the script must accept a --now flag or SOURCE_DATE_EPOCH env var so tests can pin time.

If the fixture represents an operation that was previously done by an LLM in a TRACE, compare the script's output to what the LLM actually produced in that TRACE. They should match. If they don't:

The script is wrong → fix it
The LLM was wrong in the TRACE → this is the value proof; note it in the audit report under "LLM was inconsistent: replaced"

Phase 6: Rewrite the SKILL.md

This is the compacting step. For every B-bucket operation now covered by a script, replace the prose with a script invocation.

Before:

### Step 3: Count how many bot attempts the issue has had

Look in `memory/bot-attempts.md` for entries matching the issue
number. Each entry is a heading like `## #N: <title>`. Count entries
under that heading that have a `status: fail` field. If the count
is 3 or more, do not re-delegate — file a comment on the issue
explaining the pattern of failures instead.

After:

### Step 3: Count failed attempts

scripts/count-bot-attempts.sh


Output: `<count>` (integer). If >= 3, do not re-delegate — file a
comment on the issue explaining the failure pattern instead.

Rules for the rewrite:

Be ruthless. If the LLM doesn't need the explanation to decide what to do next, delete it. Keep only what the LLM uses.
Keep the script invocation on its own fenced block. The LLM needs to spot it visually and copy it into a tool call.
Document the output shape inline. One sentence: what shape does the script return, and what does the LLM do with it?
Preserve the decision the LLM makes from the script output. That IS the probabilistic part that belongs in the skill doc. The computation leaves, the decision based on the computation stays.
Remove section headings that no longer have content. Compaction is literal: the final SKILL.md should be shorter than the input.

Phase 7: Write the report

Append a block to the rewritten SKILL.md (or a sibling crystallize-report.md) listing:

Scripts extracted (name, fixtures used, line count)
Scripts reused (name, file:line in SKILL.md that now calls it)
LLM-inconsistency findings from phase 4, if any
Operations that stayed probabilistic (with one-line why)

This report is the durable artifact. Future passes start from it.

Invariants

Never delete prose that describes a probabilistic decision. The boundary is sacred. If you're unsure whether something is judgment or computation, leave it in the SKILL.md and flag it in the C-bucket.
Never ship an extracted script without a test file. The whole point is determinism — an untested script is just more prose.
Never break the CLI of an existing script you're invoking. The SKILL.md compaction is a consumer; scripts are the contract. If a script needs a new flag, add it backward-compatibly.
Every script must be runnable from the project root with no setup. No special env vars beyond MOBISSH_TMPDIR/MOBISSH_LOGDIR. If the script needs state, it creates it.
Don't crystallize a skill that's already short and obviously judgment-driven. Compaction has a floor; don't invent deterministic operations to extract. decompose is a good example of a skill that should mostly stay probabilistic.

What success looks like

After a crystallize pass on a target skill:

The SKILL.md is shorter than before (often 30–60% reduction).
Every arithmetic, counting, comparison, or parsing operation has moved out into a real script with a real test.
The LLM's job in the skill is now exclusively judgment, prose, and deciding which script to call next based on structured output from the previous script.
Re-running the skill on the same input produces the same tool calls for the deterministic parts — variance in the output is visible only in the probabilistic parts, where it belongs.

Signals you should NOT crystallize a step

If a step has any of these properties, it probably belongs in the C bucket (must stay probabilistic):

It involves reading code and forming a quality opinion.
It involves writing prose intended for a human to read.
The correct output depends on prior conversation context the script wouldn't have access to.
The operation requires natural language understanding of intent.
The step is asking "should we do X?" not "what are the numbers?"
Multiple valid outputs exist and the choice among them requires taste.

If you find yourself writing a 200-line script full of heuristics and special cases to replace a one-paragraph LLM instruction, stop — that's a sign the operation is actually probabilistic and the script is a fragile attempt to simulate judgment. Revert and move the step to the C bucket.

Relationship to other skills

agent-trace: provides the TRACE data that phase 2 mines for fixtures. A mature repo with many TRACEs can crystallize faster because fixtures are plentiful.
write-tests: the test files phase 3 produces are unit tests, but they're handcrafted against fixtures — not the same as the behavioral tests write-tests produces for app code. Don't confuse.
simplify: simplify reviews CHANGED code for cleanup; crystallize reviews SKILL DOCS for computational content that should move into scripts. Different targets, non-overlapping scope.
decompose: decomposition is fundamentally judgment-driven — crystallize will leave it alone except for one small scripted piece (counting touched files, which is already scripted).

Outcome goal

Push the LLM toward a workflow where every single tool call is either (a) a deterministic script that takes structured input and produces structured output, or (b) a prose response that makes a judgment based on the structured output of prior scripts. The LLM never does math. The LLM never counts. The LLM never compares timestamps. The LLM never parses a format it could invoke a parser for. Those are script jobs, and crystallize is the pass that finds them and makes them so.