Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

self-healing

Name: Self Healing
Author: pskoett

// Active runtime recovery for coding agents: when something breaks mid-task, diagnose the root cause, write a fix, VERIFY by re-running the broken thing, then file a `HEAL-` entry to `.learnings/HEALS.md` with proof and any generated artifacts. Use whenever a command/test/build/lint exits non-zero or breaks, the agent hits a `ModuleNotFoundError` / `command not found` / `EACCES` / `permission denied` / port conflict / dependency or lockfile mismatch / wrong runtime version / venv not activated / stale `node_modules` / dirty git state / missing `.env`, the agent needs a helper / glue script / bulk-rename / one-off tool that doesn't exist yet, an external API/tool/MCP returns an unexpected error or hits a rate limit, a snapshot test flakes or any test fails intermittently, or the agent catches itself about to retry the same broken approach. Search `HEALS.md` by `Pattern-Key` first — most heals are recurrences (increment `Recurrence-Count`; don't duplicate). Verify is mandatory and unforgeable: mark `pending-verif

Ejecutar en Manus

$ git log --oneline --stat

stars:193

forks:38

updated:26 de mayo de 2026, 06:29

Explorador de archivos

19 archivos

SKILL.md

readonly

related-skills.json

mismo repositorio

self-healing-ci.md

from "pskoett/pskoett-ai-skills"

CI-only self-healing workflow using gh-aw (GitHub Agentic Workflows) for active runtime recovery on pull requests and scheduled runs. When a CI check fails (test, build, lint, deploy, scan), this skill diagnoses the failure from CI logs, proposes a verified patch as a PR comment or follow-up commit, and commits a HEAL entry to `.learnings/HEALS.md`. Verify-before-persist discipline preserved: a HEAL is only `verified` if a re-run check passes in the same workflow; otherwise it ships as `pending-verify` for human follow-up. Recurrent heal patterns across PRs accumulate `Recurrence-Count` and append a `Handoff` block at ≥3 to flag promotion via self-improvement-ci. Use this skill when: you want headless heal-loop execution in CI/scheduled pipelines, you want recurring failure patterns captured automatically, or you want PRs that surface non-obvious environmental / tooling fixes without human triage. For interactive/local sessions, use `self-healing` instead.

2026-05-26193

self-healing.md

from "pskoett/pskoett-ai-skills"

2026-05-26193

learning-aggregator.md

from "pskoett/pskoett-ai-skills"

[Beta] Cross-session analysis of accumulated .learnings/ files. Reads all entries, groups by pattern_key, computes recurrence across sessions, and outputs ranked promotion candidates. This is the outer loop's inspect step — it turns raw learning data into actionable gap reports. Use on a regular cadence (weekly, before major tasks, or at session start for critical projects). Can be invoked manually or scheduled.

2026-05-26193

pre-flight-check.md

from "pskoett/pskoett-ai-skills"

[Beta] Session-start scan that surfaces relevant learnings, recent errors, and eval status before work begins. Bridges the outer loop back into the inner loop by making accumulated knowledge visible at task start. Activated via SessionStart hook or manually before major tasks.

2026-05-26193

self-improvement.md

from "pskoett/pskoett-ai-skills"

Captures learnings, errors, corrections, and feature requests to enable continuous improvement. Use when: (1) User corrects Claude ('No, that's wrong...', 'Actually...'), (2) User requests a capability that doesn't exist, (3) Claude realizes its knowledge is outdated or incorrect, (4) A better approach is discovered for a recurring task, (5) Receiving a Handoff block from self-healing (a recurring verified heal at Recurrence-Count >= 3) to distill into a memory file or new skill. For ACTIVE runtime failures where the agent needs to apply and verify a fix mid-task, use `self-healing` instead (it files HEAL- entries with proof; self-improvement promotes accumulated patterns). Also review learnings before major tasks. For CI-only/headless learning capture, use self-improvement-ci.

2026-05-26193

skill-pipeline.md

from "pskoett/pskoett-ai-skills"

Pipeline orchestrator that classifies incoming coding tasks and routes them through the correct combination of skills at the right depth. Implements two feedback loops: the inner loop (detect, verify, recover) runs within a session via plan-interview, intent-framed-agent, context-surfing, verify-gate, self-healing (active recovery on failure), simplify-and-harden, and self-improvement. The outer loop (inspect, encode, regress-test) runs across sessions via learning-aggregator, harness-updater, and eval-creator. pre-flight-check bridges the two by surfacing accumulated knowledge — past heals and learnings — at session start. Handles standard, team-based, CI, and outer-loop pipeline variants. Does not replace individual skills; dispatches to them.

2026-05-26193

package.json

"author": "pskoett"

"repository": "pskoett/pskoett-ai-skills"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas15-1252L4

name

self-healing

description

Active runtime recovery for coding agents: when something breaks mid-task, diagnose the root cause, write a fix, VERIFY by re-running the broken thing, then file a `HEAL-` entry to `.learnings/HEALS.md` with proof and any generated artifacts. Use whenever a command/test/build/lint exits non-zero or breaks, the agent hits a `ModuleNotFoundError` / `command not found` / `EACCES` / `permission denied` / port conflict / dependency or lockfile mismatch / wrong runtime version / venv not activated / stale `node_modules` / dirty git state / missing `.env`, the agent needs a helper / glue script / bulk-rename / one-off tool that doesn't exist yet, an external API/tool/MCP returns an unexpected error or hits a rate limit, a snapshot test flakes or any test fails intermittently, or the agent catches itself about to retry the same broken approach. Search `HEALS.md` by `Pattern-Key` first — most heals are recurrences (increment `Recurrence-Count`; don't duplicate). Verify is mandatory and unforgeable: mark `pending-verify` honestly if sandboxed/offline, `abandoned` if the fix can't be made to work. Pairs with `self-improvement` (which promotes recurring heals at `Recurrence-Count >= 3` to durable memory like CLAUDE.md / AGENTS.md / new skills) but owns the verify-before-persist discipline that self-improvement doesn't. Trigger eagerly on any failure shape above — an agent that fixes things ad-hoc without filing a HEAL is leaving silent recurrences in the codebase, which is exactly the failure mode this skill exists to prevent.

Self-Healing

Active runtime recovery for coding agents. When something breaks, run the loop: diagnose → patch → verify → file. Leave behind a reusable, verified artifact instead of a swept-under-the-rug failure.

The premise mirrors browser-use/browser-harness: the harness improves itself every run. An agent that hits a gap doesn't fail — it writes the fix during execution, verifies it works, and files the durable artifact for future runs. Coding tasks deserve the same loop.

What this skill is for

When a coding agent hits a wall mid-task, the default failure modes are:

Paper over it — "let me try a different approach" — and lose the recovery
Pretend the fix worked — without re-running the broken thing
Symptom-fix — skip the test, swallow the error, retry until green

All three turn a one-time failure into a recurrence. The next agent on the same project hits the same wall.

This skill enforces one discipline: verify before persist. A patch isn't real until you've re-run the failing operation and watched it succeed. When it does, file the verified fix so the next run benefits.

Relationship to self-improvement

These two skills are deliberately split. Run both — they feed each other but don't overlap.

Aspect	`self-healing` (this skill)	`self-improvement`
When	During execution, failure is live	After the fact, at natural breakpoints
Verb	Heal now — restore working state	Remember for later — accumulate knowledge
Outcome	Verified patch + (optional) reusable artifact	Logged learning, correction, request
Verify	Mandatory — no persist without proof	Not required
Files	`.learnings/HEALS.md` + `.learnings/heals/<HEAL-ID>/` (lazy)	`.learnings/ERRORS.md`, `LEARNINGS.md`, `FEATURE_REQUESTS.md`
Trigger	Failure observed mid-task	Correction, knowledge gap, feature request, recurrence

Boundary rule: if you're capturing a fact, a correction, or a wish — that's self-improvement. If you're applying and verifying a fix to a live failure — that's self-healing.

The Heal Loop

  ● failure observed
  │
  ● 1. DIAGNOSE  capture context — command, error, env, what was attempted
  │              search HEALS.md for the same Pattern-Key first
  │              (most heals are recurrences; don't reinvent)
  │
  ● 2. PATCH     write the fix — script, helper, env tweak, alt command
  │              artifacts → .learnings/heals/<HEAL-ID>/  (only if needed)
  │
  ● 3. VERIFY    re-run the failing op — must succeed
  │              ↻ if still failing: refine and retry, cap at 3 attempts
  │              ✗ if uncrackable: file Status: abandoned with notes
  │
  ● 4. FILE      write HEAL-YYYYMMDD-XXX to .learnings/HEALS.md
  │              with Pattern-Key, status, verification proof
  │
  ✓ working state restored, heal persisted

  (conditional) PROMOTE  if Pattern-Key recurrence ≥ 3 across distinct tasks,
                          append a Handoff block → self-improvement promotes to memory

If you abandon a heal mid-loop, don't pretend it succeeded. File a HEAL- entry with Status: abandoned and notes on what didn't work. The next agent learns from the dead end too.

When to trigger

Self-healing fires on active failures during execution — the agent has just observed something not working and needs to make it work to continue. Five shapes:

1. Tool failure (command / test / build / lint)

Any invocation exits non-zero or produces wrong output. Don't acknowledge and retry verbatim — diagnose, patch, verify.

Examples: npm install errors when a pnpm-lock.yaml is present (switch tool); pytest fails with ModuleNotFoundError (activate the venv); tsc flags a stale type (regenerate the client); eslint reports a config error (install the missing parser).

2. Missing capability / tool gap

The agent needs something that doesn't exist yet — a script, a helper, a wrapper, a glue function. Write it in the moment. This is the closest analog to browser-harness's agent_helpers.py.

Examples: dedupe a CSV by custom key (write a small Python helper); bootstrap 12 microservices the same way (write scripts/bootstrap-all.sh); bulk-rename branches matching a pattern (write a gh-based shell helper).

3. Environment issue

The local environment isn't what the project expects. Detect, patch, verify.

Examples: runtime version mismatch (nvm use, pyenv local, rustup override); stale dependency cache after a branch switch; dirty git state blocking a checkout; missing .env (copy from .env.example and surface gaps).

4. External service / API change

A service the agent depends on returns something unexpected. Find a workaround and capture it.

Examples: an MCP tool returns InputValidationError because the schema changed (patch the call shape); a public API hits a rate limit (back off, switch endpoint, batch); an upstream lib bumped a default and broke a script (pin the version).

5. About-to-retry-the-same-broken-approach

The agent catches itself about to redo the failing step. That self-recognition is a heal forming — capture the alternate approach as the patch.

Detection signals to watch for

Non-zero exit codes
Stack traces in tool output
The same operation failing twice with the same error
"I'll try a different approach" — capture it as a heal
command not found / module not found / permission denied
Stale assertions, snapshot mismatches, type errors that weren't there before
"Weird" output that suggests environmental rather than logical bugs

HEAL Entry Format

Append to .learnings/HEALS.md (create if missing):

## [HEAL-YYYYMMDD-XXX] short_kebab_name

**Logged**: ISO-8601 timestamp
**Status**: verified | pending-verify | abandoned
**Trigger**: tool-failure | missing-capability | env-issue | external-change | <free-form>
**Active-Context**: (optional) — current skill, task phase, or workflow stage; omit if not applicable
**Area**: free-form tag — what part of the system (`build`, `tests`, `ci`, `auth`, `data-pipeline`, `mobile`, ...)
**Priority**: low | medium | high | critical

### Failure
What broke — concrete: the command, the error message, the action that was blocked. Include exit codes and verbatim error lines.

### Diagnosis
The root cause as understood after investigation. Why the obvious approach didn't work. Not a guess — what was actually verified during the heal.

### Fix
The patch that was applied. Verbatim commands, code snippets, or pointers to files under `.learnings/heals/<HEAL-ID>/`. Keep it minimal — just enough to reproduce.

### Verification
What was run after the fix and what it returned. Exit code, output snippet, test pass count. **This is the proof.** Without it, the entry is `pending-verify` or `abandoned`.

### Artifacts
(omit this section if no files were generated; otherwise list relative paths under `.learnings/heals/<HEAL-ID>/`)

### Metadata
- Related Files: path/to/file.ext
- See Also: HEAL-... | LRN-... | ERR-... (related entries)
- Pattern-Key: lower.snake.case key for recurrence detection (e.g. `env.lockfile_mismatch`)
- Recurrence-Count: 1
- First-Seen / Last-Seen: YYYY-MM-DD

---

Field guidance

Status — verified = the verify step passed. pending-verify = patch applied but couldn't be fully proven (sandboxed/offline/CI-only) — surface to the user. abandoned = patch didn't work or diagnosis was wrong — document what was tried.
Trigger — free-form is fine. The listed values are common shapes; what matters is that the failure shape is described enough for future agents to match against.
Active-Context — optional. Use it if your environment has a meaningful "what was I doing" tag (an active skill, a current task phase, a build stage, an agent role). Skip if not applicable. The browser-harness analog is the per-domain scoping of domain-skills/<site>/.
Area — free-form. Pick whatever helps future agents find this. frontend, data-pipeline, ci, auth, terraform, mobile, embedded — anything that fits your project shape.
Pattern-Key — lower.snake.case, stable, reusable across projects. Two heals with the same key are recurrences. env.lockfile_mismatch is good; fixed_thing_tuesday isn't.

ID generation

Format: HEAL-YYYYMMDD-XXX. XXX is sequential 3-digit or 3-char random alphanumeric. Examples: HEAL-20260524-001, HEAL-20260524-A7B.

Artifacts directory (lazy)

Only create .learnings/heals/<HEAL-ID>/ when the heal generated something worth preserving. One-line fixes don't need a folder; the HEAL entry text is enough. Abandoned heals with no applied patch also skip the folder.

.learnings/
├── HEALS.md
├── ERRORS.md / LEARNINGS.md / FEATURE_REQUESTS.md  (self-improvement)
└── heals/
    └── HEAL-20260524-001/
        ├── helper.sh
        ├── patch.diff
        └── notes.md

Put here: generated scripts/helpers, patch files, supplementary notes, output captures that document the diagnosis. Don't put here: project source changes (those go in the project tree, referenced via Related Files); secrets; output already captured in the HEAL text.

Verification rules

Verify is the load-bearing wall. The whole point of self-healing over self-improvement is that the fix is proven, not theorized.

What counts as proof

Failure shape	Verification
Tool / command / test / build / lint	Re-run the original invocation; expect exit 0 / pass
Missing capability	Invoke the helper end-to-end on a real input; expect the intent
Environment drift	Re-run the operation that triggered the diagnosis
External service workaround	Re-run the failed call with the patch; expect a usable response

Sandboxed / offline / CI-only failures

When you genuinely can't run the verify step (no network, no real remote, sandboxed shell, CI-only reproduction), file Status: pending-verify with:

The exact command the user / CI should run
The acceptance criteria — what counts as proof
A simulated proof if you can construct one (e.g. a dry-run mode, a stub of the failing call, a sandbox script)

pending-verify is honest. Faking verified is the failure mode this skill exists to prevent.

When to invest in a proof script

Most heals don't need a separate proof script — the verify step is just re-running the failing thing. Build a proper proof script when:

The heal generates a reusable helper that needs to be exercised across cases
The failure can't be reproduced live but can be reproduced in a sandbox (clean git repo, mock service, fake input)
You expect the heal to be re-applied across projects — the proof script then doubles as a regression check

If verification fails

Once — refine the patch and retry. First diagnosis is often wrong.
Twice — step back and reconsider the diagnosis. Maybe the root cause is elsewhere.
Three times — stop. File Status: abandoned with notes on what you tried. Surface to the user. Don't flail.

What does NOT count as verification

"It looks right" / "I think this should work"
Re-running a different command than the one that originally failed
Suppressing the failure (|| true, --ignore-errors) — that's hiding
Skipping or deleting the failing test — that's regression
Passing because the cache was warm from before the fix

Reversibility

Prefer reversible patches. If your heal modifies project files, capture the diff in patch.diff. If the heal is destructive (deletes generated files, rewrites locks), note it explicitly — a future agent reading the HEAL needs to know what was destroyed.

Recurrence and promotion

Most heals are recurrences. Before filing a new HEAL, search:

grep -n "Pattern-Key: <your-pattern-key>" .learnings/HEALS.md

If found:

Increment Recurrence-Count
Update Last-Seen
Add the current occurrence as a See Also link
Do not create a duplicate entry

Promotion threshold

Add a Handoff block to an existing entry when all are true:

Recurrence-Count >= 3
Seen across at least 2 distinct tasks
The fix is generalizable (not project-specific in a way that's already in a memory file)

### Handoff
- **Promoted To**: self-improvement at YYYY-MM-DD
- **Promotion Target**: CLAUDE.md | AGENTS.md | .github/copilot-instructions.md | new-skill
- **Distilled Rule**: One-line prevention guidance derived from the heal

Then self-improvement (or a learning aggregator) takes over: distills the rule, writes it into the right context file, or extracts a reusable skill. The HEAL stays for traceability.

Anti-patterns

Logging without verifying. A HEAL filed before the fix is proven turns this into noisier self-improvement. If verify hasn't passed, the entry is pending-verify or abandoned.
Healing the symptom, not the cause. A failing test isn't healed by skipping it (pytest.skip, it.skip, xit). A flaky CI isn't healed by --retry. Find the root cause; if you can't, abandon honestly.
Generating a new fix without trying existing ones first. Search HEALS.md by Pattern-Key. Most heals are recurrences.
Inventing helpers when the project already has them. Look in scripts/, Makefile, justfile, package.json, pyproject.toml first. Heal = write what's missing, not what's there.
Scope creep. A heal is scoped to one failure. Cleanup belongs in a quality pass; refactors are features. Scope creep makes heals unreviewable.
Empty artifact folders. Don't create .learnings/heals/<HEAL-ID>/ if nothing goes in it.

Best practices

Heal eagerly, file always. Even abandoned heals teach the next agent what doesn't work.
Verify before persist. The non-negotiable rule.
Minimal and reversible patches. A 3-line fix is a heal; a 300-line refactor is a feature.
Stable Pattern-Keys. env.node_version_mismatch is reusable; fixed_the_thing_on_tuesday isn't.
Reference, don't duplicate. Cross-link related HEAL/LRN/ERR via See Also.
Hand off recurrences. A heal seen 3 times deserves to be in the project's permanent memory.
Don't gate the main tree on heal artifacts. Files under .learnings/heals/ are reference material; if a script becomes load-bearing, promote it to scripts/.

Setup

mkdir -p .learnings        # heals/ is lazy — created only when artifacts exist
touch .learnings/HEALS.md

Gitignore choices match self-improvement. Keep heals local (.learnings/ in .gitignore) or share them as team knowledge (don't gitignore — they become reviewable durable context).

Hook integration

Automatic triggering on command failures is optional and agent-specific. See references/hooks.md for Claude Code / Codex configuration.

Multi-agent use

The skill is agent-agnostic. The .learnings/HEALS.md format is plain markdown — any agent (Claude Code, Codex CLI, Copilot, Cursor, Aider, ...) can read and write it. Agents without hook support can be reminded via their instruction file (e.g. .github/copilot-instructions.md). See references/hooks.md for examples.

Pipeline integration

How self-healing slots into a larger skill pipeline (with upstream surfacing of past heals, downstream promotion of recurrences, and machine-verification gates) is documented in references/pipeline-integration.md. Not required to use this skill — it stands alone.

self-healing

Más de este repositorio

Más de este repositorio

Self-Healing

What this skill is for

Relationship to self-improvement

The Heal Loop

When to trigger

1. Tool failure (command / test / build / lint)

2. Missing capability / tool gap

3. Environment issue

4. External service / API change

5. About-to-retry-the-same-broken-approach

Detection signals to watch for

HEAL Entry Format

Field guidance

ID generation

Artifacts directory (lazy)

Verification rules

What counts as proof

Sandboxed / offline / CI-only failures

When to invest in a proof script

If verification fails

What does NOT count as verification

Reversibility

Recurrence and promotion

Promotion threshold

Anti-patterns

Best practices

Setup

Hook integration

Multi-agent use

Pipeline integration

See also

Self-Healing

What this skill is for

Relationship to self-improvement

The Heal Loop

When to trigger

1. Tool failure (command / test / build / lint)

2. Missing capability / tool gap

3. Environment issue

4. External service / API change

5. About-to-retry-the-same-broken-approach

Detection signals to watch for

HEAL Entry Format

Field guidance

ID generation

Artifacts directory (lazy)

Verification rules

What counts as proof

Sandboxed / offline / CI-only failures

When to invest in a proof script

If verification fails

What does NOT count as verification

Reversibility

Recurrence and promotion

Promotion threshold

Anti-patterns

Best practices

Setup

Hook integration

Multi-agent use

Pipeline integration

See also