Run any Skill in Manus with one click

harness-audit

Score a project's agent harness across 5 subsystems (Instructions / State / Verification / Scope / Lifecycle), identify the bottleneck, and produce a prioritized improvement plan. Use when assessing if a project is ready to graduate to [LONG-RUN] status, when an agent keeps failing despite good models, or when adopting our stack on a new codebase.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/AnastasiyaW/claude-code-config --skill harness-audit

Copy and paste this command into Claude Code to install the skill

Source

AnastasiyaW/claude-code-config

Stars124

Forks19

UpdatedMay 12, 2026 at 08:45

File Explorer

3 files

SKILL.md

readonly

More from this repository

same repository

feature-new

AnastasiyaW/claude-code-config

Scaffold a new feature narrative document in an existing layer following the ULTRAPACK-style template extended for feature-layer architecture (principle 28). Creates docs/layers/<layer>/features/feat-NNN-<slug>.md with Design / Plan / Verify / Conclusion sections, populates layer README features table, and adds entry to feature_list.json if present. Use when: "create a new feature", "start work on feature", "scaffold feature doc", "/feature-new", "new feature in <layer>", "begin feature narrative". Auto-allocates next F-NNN ID.

2026-05-13124

layer-new

AnastasiyaW/claude-code-config

Scaffold a new layer in a project's docs/layers/ tree following the feature-layer architecture (principle 28). A layer is a bounded concern (security, data, ui, infrastructure, domain) with its own invariants, decisions, gotchas, patterns, and feature narratives. Use when: "create a new layer", "add security layer", "scaffold layer", "start tracking <concern> separately", "/layer-new", "add bounded concern". Operates on the kb-skeleton structure; idempotent -- will not overwrite existing layers.

2026-05-13124

pixel-art-storyboard

AnastasiyaW/claude-code-config

Convert short scene descriptions, book/album cover briefs, or 2-paragraph synopses into seamless-loop animated pixel-art covers rendered as self-contained HTML+canvas. Companion skill to pixel-art-studio. Use when the user asks to "make a cover for", "animated book cover", "looped pixel scene", "convert this story description to pixel art", "create ambient pixel animation", "обложка для книги в пиксель-арте", "анимированная обложка", "封面像素画", "픽셀 아트 표지", or provides a short narrative/synopsis and wants a visual result. Covers: 5-element scene framework (Subject + Setting + Lighting + Palette + Motion), iconographic shorthand for symbolic accents, seamless loop techniques (phase-based parametric, sub-pixel breathing, LCM-clean parallax, deterministic particle systems), loop period selection by mood, three prompt registers (LLM agent / human artist / SDXL LoRA), single-HTML-file deliverable in dark-atmospheric style with parametrized canvas rendering. Generates the same engine pattern as the user's `Grass Fiel

2026-05-10124

pixel-art-studio

AnastasiyaW/claude-code-config

Create production-quality pixel art and animations programmatically. Use when the user asks to "create pixel art", "draw a sprite", "make pixel animation", "generate sprite sheet", "convert image to pixel art", "pixelate this image", "make a pixel character", "пиксель арт", "пиксельная графика", "спрайт", "像素画", "像素艺术", "도트 그래픽", "픽셀 아트", "8-bit/16-bit/hi-bit style", "retro game art", "Aseprite-like output", "indie game sprite". Covers single-frame sprites, frame-by-frame animations, walk cycles, idle/attack/death animations, sprite sheets, GIF/APNG export, image-to-pixel-art preprocessing (downsample + quantize + dither), 30+ bundled palettes (NES, GameBoy, PICO-8, Endesga 32/64, DawnBringer 16/32, Sweetie 16, Resurrect 64, Korean 오방색/단청, Chinese 故宫/青花/五行, Russian Stoneshard-inspired), 5 dithering algorithms (Bayer 2/4/8, Floyd-Steinberg, Atkinson, Ordered, Blue Noise), automated quality scoring (orphan pixels, doublies, banding, pillow-shading, AI-slop detection), and Generator-Evaluator review via the pixe

2026-05-10124

desktop-sessions-discovery

AnastasiyaW/claude-code-config

Discover, search, and selectively restore Claude desktop app sessions hidden across multiple accountIds. Use when user mentions "missing sessions after account switch", "lost desktop sessions", "where do my old sessions live", or runs multiple Claude accounts on the same machine.

2026-04-29124

article-structure-review

AnastasiyaW/claude-code-config

Структурный self-review технической статьи перед публикацией. Покрывает три дыры, которые не ловятся точечными скиллами типа humanize/infostyle: thesis/proof balance, жанровая чистота, обязательный блок ограничений. Применяется ПОСЛЕ написания первого черновика, ПЕРЕД humanize + infostyle. Основано на фидбеке реальных читателей на опубликованные статьи - классический паттерн "много тезисов / мало доказательств" и отсутствие честного блока про то, что не решено. Use AFTER first draft is done, BEFORE word-level audits.

2026-04-23124

Source

AnastasiyaW

AnastasiyaW/claude-code-config

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	harness-audit
description	Score a project's agent harness across 5 subsystems (Instructions / State / Verification / Scope / Lifecycle), identify the bottleneck, and produce a prioritized improvement plan. Use when assessing if a project is ready to graduate to [LONG-RUN] status, when an agent keeps failing despite good models, or when adopting our stack on a new codebase.
when_to_use	Trigger on phrases like: "audit my harness", "evaluate my agent setup", "score my CLAUDE.md", "is my project ready for long-run", "5-subsystem assessment", "what's missing from my project setup", "/harness-audit". Run proactively when joining an unfamiliar codebase that has agent artifacts (CLAUDE.md, .claude/, AGENTS.md) but obvious gaps. Skip for single-file scripts and pure exploration.
license	MIT

Harness Audit

Score a project's agent harness across five subsystems and tell the user which one to fix first.

Source: Five-subsystem framework adapted from Learn Harness Engineering (walkinglabs, MIT). Adapted to our concrete stack: CLAUDE.md, .claude/rules/, PROBLEMS.md, feature_list.json, init.sh, hooks, handoffs, chronicles.

What This Skill Does

Given a project directory, produces a scorecard like this:

=== Harness Audit: project-xyz ===

Instructions  4/5  ✓ CLAUDE.md present, modular rules in .claude/rules/
                   ✗ No project-level REVIEW.md for PR review guidance
State         2/5  ✓ .claude/handoffs/ exists (3 files)
                   ✗ No PROBLEMS.md - issues scattered in handoffs
                   ✗ No feature_list.json - scope state not machine-readable
Verification  3/5  ✓ Tests run, pytest configured
                   ✗ No init.sh - new sessions take 15+ min to bootstrap
                   ✗ 3-layer gate not documented in CLAUDE.md
Scope         3/5  ✓ no-pre-existing-evasion principle in CLAUDE.md
                   ✗ No WIP=1 (no feature_list.json to enforce it)
                   ✗ Definition of Done not explicit
Lifecycle     2/5  ✗ No SessionStart hook (no .claude/settings.json)
                   ✗ No Stop hook for clean-state check
                   ~ Manual cleanup convention exists but not enforced

Bottleneck: State (2/5) — lack of structured progress tracking

Top 3 improvements (in order):
1. Create PROBLEMS.md (1h)   ↗ State 2→4
   Template: claude-code-skills/templates/long-run-project/ has examples
2. Create feature_list.json + init.sh (30min)   ↗ State 2→5, Verification 3→4
   Drop-in: claude-code-skills/templates/long-run-project/
3. Add Stop hook stop-test-gate.py (15min)   ↗ Lifecycle 2→4
   Source: claude-code-skills/hooks/stop-test-gate.py

After top 3: Instructions 4 + State 5 + Verification 4 + Scope 3 + Lifecycle 4 = 20/25 (was 14/25)

The skill does not make changes. It produces the scorecard. The user decides whether to apply recommendations.

The Five Subsystems (Our Adaptation)

Subsystem	Concrete files/conventions in our stack
Instructions	`CLAUDE.md` (root + `~/.claude/`), `.claude/rules/.md` (project), `~/.claude/rules/.md` (global), optional `REVIEW.md`
State	`PROBLEMS.md`, `feature_list.json`, `.claude/handoffs/`, `.claude/chronicles/`
Verification	`init.sh`, tests configured, 3-Layer Validation Gate referenced in CLAUDE.md, Proof Loop usage
Scope	`no-pre-existing-evasion.md` rule applied, WIP=1 enforced (one `in-progress` in feature_list.json), explicit Definition of Done
Lifecycle	SessionStart hooks, Stop hooks (stop-test-gate, check-problems-md), cleanup convention

See references/checklist-per-subsystem.md for per-subsystem concrete checks. See references/scoring-rubric.md for how to interpret 1-5 scores.

How to Run an Audit

Phase 1 — Gather

Read these files in order (skip silently if missing):

CLAUDE.md in project root
AGENTS.md in project root (some projects use this name)
.claude/rules/*.md (project-level rules)
.claude/settings.json and .claude/settings.local.json (hooks config)
PROBLEMS.md in root
feature_list.json in root
init.sh in root (and Makefile / package.json scripts as fallback)
.claude/handoffs/ (count files, check INDEX.md existence)
.claude/chronicles/ (count files)
Sample test config: pytest.ini / package.json test script / Cargo.toml

Use Glob + Read. Don't grep across entire codebase — this is metadata audit, not code review.

Phase 2 — Score

For each subsystem, run the checks in references/checklist-per-subsystem.md. Each check is a binary pass/fail. Score:

5 = all checks pass + documented + consistently followed
4 = most checks pass, 1-2 gaps
3 = covers basics, missing polish
2 = weak, several checks fail
1 = missing or actively harmful

For each subsystem, list:

✓ what's present and working
✗ what's missing or broken
~ partial / unclear

Phase 3 — Identify Bottleneck

The lowest-scoring subsystem is the bottleneck. Even if other subsystems are weaker by absolute count of checks, the lowest score is the one to fix first because it limits the value of the rest.

Tie-breaker (multiple subsystems at same low score): pick the one whose improvement unlocks progress in others. State usually wins ties because feature_list.json + PROBLEMS.md unlock Verification and Scope checks.

Phase 4 — Prioritized Improvement Plan

Output exactly 3 next steps in order, each with:

Effort estimate (15min / 30min / 1h / 1d)
Subsystem(s) it improves and by how much (2→4, etc.)
Pointer to a template or example in claude-code-skills/ if available

The 3 steps must:

Address the bottleneck first
Each step independently shippable (no item depends on a later one)
Together raise the total score by at least 4 points (out of 25)

Do not give more than 3. Three is enough scope for one focused session.

Output Format

Use the visual scorecard format shown at the top of this skill. Sections:

Header: === Harness Audit: <project-name> === (one line)
Scorecard: 5 lines, one per subsystem, with score + ✓/✗ findings
Bottleneck: one line naming the subsystem and score
Top 3 improvements: numbered list with effort + impact + pointer
Projected total: optional, only if user asked for "after" state

Keep the entire output under 50 lines. The user is scanning for next steps, not reading an essay. Detail goes into the per-subsystem checklist file, not the audit output.

What This Skill Is NOT

Not a code review — does not look at source code quality
Not a security audit — does not check for vulnerabilities (use /security-review instead)
Not a test runner — does not execute init.sh or tests, just checks existence
Not a fix tool — produces recommendations only, user applies them
Not for short-term projects — if the project is <5 features or <5 sessions, the harness overhead is not yet warranted; say so and skip the audit

Honest Tradeoffs

The 5-subsystem framework is opinionated. A project can be perfectly functional with 3 of 5 strong and 2 weak (e.g., a research repo with no lifecycle needs).
Scoring is subjective at the margins. A 3 vs 4 for "covers basics" is a judgment call. Use the checklist to keep it consistent across audits, not to claim numeric precision.
The skill assumes our stack conventions. For projects using completely different tooling (e.g., AGENTS.md without .claude/), translate concepts before scoring — don't fail the project on naming.

Principle 27 (feature-tracking) — full framework explanation
Principle 01 (harness-design) — Generator-Evaluator pattern, source of "subsystems" thinking
Templates templates/long-run-project/ — drop-in files for fixing State + Verification gaps
Rule rules/long-run-harness.md — convention this audit checks against

Quick Self-Audit (for skill development)

This skill is itself a [LONG-RUN]-style artifact. To audit the audit:

Instructions: SKILL.md is this file (✓)
State: Scoring decisions are reproducible from references/scoring-rubric.md (✓)
Verification: 5 example evals in references/example-audits.md (TODO if added)
Scope: Clear "what this skill is NOT" section (✓)
Lifecycle: No hooks needed — this is a query skill, not a continuous one (N/A)

harness-audit

Harness Audit

What This Skill Does

The Five Subsystems (Our Adaptation)

How to Run an Audit

Phase 1 — Gather

Phase 2 — Score

Phase 3 — Identify Bottleneck

Phase 4 — Prioritized Improvement Plan

Output Format

What This Skill Is NOT

Honest Tradeoffs

Related

Quick Self-Audit (for skill development)

Harness Audit

What This Skill Does

The Five Subsystems (Our Adaptation)

How to Run an Audit

Phase 1 — Gather

Phase 2 — Score

Phase 3 — Identify Bottleneck

Phase 4 — Prioritized Improvement Plan

Output Format

What This Skill Is NOT

Honest Tradeoffs

Related

Quick Self-Audit (for skill development)