Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

live-exercise

Name: Live Exercise
Author: gitkraken

// Use whenever any UI-bearing work touches a running instance — building or fixing a feature, ship-gating, auditing, OR debugging visible bugs (flaky behavior, intermittent rendering, "sometimes does X" reports, hover/focus/animation glitches, layout overflow). Keep a running instance in the loop and exercise it as a user would. Adaptive depth from tactical fix-loop to ship-gate audit. Not for pure-logic diff review.

Ejecutar en Manus

$ git log --oneline --stat

stars:9827

forks:1725

updated:11 de mayo de 2026, 02:29

SKILL.md

readonly

related-skills.json

mismo repositorio

worktree.md

from "gitkraken/vscode-gitlens"

Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees following GitLens conventions

2026-05-279.8k

live-inspect.md

from "gitkraken/vscode-gitlens"

Use for one-off / single-question inspection of the running GitLens extension — examining UI state, reading logs, checking feature flags, dispatching a command, or asking "what does the live DOM look like right now". Reference for `vscode-inspector` MCP primitives. For iterative debug-and-fix loops on UI bugs (sweep → fix → re-verify), use `/live-exercise` instead.

2026-05-119.8k

investigate.md

from "gitkraken/vscode-gitlens"

Investigate bugs — single-issue deep root cause analysis or batch parallel investigation from a report or issue list

2026-05-029.8k

live-pair.md

from "gitkraken/vscode-gitlens"

Use when you want to iterate on a feature interactively with the user watching a running instance — pair-programming rhythm for UI-heavy work, redesigns, copy tightening, layout exploration, or any "let me just show you what I want" session. Not for systematic audit (/live-exercise) or perf-tuning (/live-perf).

2026-04-289.8k

live-perf.md

from "gitkraken/vscode-gitlens"

Use when you want to measure and improve the performance of a feature in the running extension, hunt regressions, or perf-tune before shipping. Standalone for perf-tuning an existing feature; also invoked by /live-exercise Phase 7. Not for static code review without measurement.

2026-04-289.8k

a11y-audit.md

from "gitkraken/vscode-gitlens"

Use to audit a component, file, or directory for WCAG 2.1 AA accessibility compliance. Detects ARIA anti-patterns, missing semantics, keyboard gaps, and color-only information. Safety-first — refuses to emit fixes that would create a new accessibility bug. Scope is always explicit; do not use for page-level flow or cross-program planning.

2026-04-249.8k

package.json

"author": "gitkraken"

"repository": "gitkraken/vscode-gitlens"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Analistas de garantía de calidad de software y probadoresOcupaciones informáticas y matemáticas15-1253L4

Ejecuta cualquier Skill con un clic

name

live-exercise

description

Use whenever any UI-bearing work touches a running instance — building or fixing a feature, ship-gating, auditing, OR debugging visible bugs (flaky behavior, intermittent rendering, "sometimes does X" reports, hover/focus/animation glitches, layout overflow). Keep a running instance in the loop and exercise it as a user would. Adaptive depth from tactical fix-loop to ship-gate audit. Not for pure-logic diff review.

/live-exercise — Live functionality, intent, and quality loop

Operate the feature as a user would. Measure live state, not diffs. Apply the four lenses (functional, intent, polish, improvement) every sweep. Fix in parallel, verify after rebuild, re-sweep. At end-of-loop, optionally simplify code drift and optionally sweep performance. Stop only when sweep + simplify + perf all produce nothing new.

This is the normal working rhythm for UI-bearing work — from tactical bug-fix iterations through ship-gate audits. Adaptive depth; no upfront prompt.

When to use vs other skills

Skill	Purpose	Mode
`/review`	Standards + completeness checklist	Static
`/deep-review`	Correctness tracing end-to-end	Static
`/ux-review`	User flows against a goals doc	Static
`/live-inspect`	Reference for `vscode-inspector` MCP tools	Primitive
`/live-exercise`	Live operation + audit + fix loop	Live + iterative

Use /live-exercise any time you touch UI — adding a panel, fixing a mode, refactoring a webview, ship-gating a feature. Not as a one-off audit; as the default way to work on UI.

Prerequisites

vscode-inspector MCP connected (auto-discovered via .mcp.json)
Build currently passes (pnpm run build:quick)

Lenses — applied every sweep

Lens	When applied	Blocking?
L1 — Functional walk-through	Always. Layout, console, pixel invariants + handler verification (exercise the action, verify the intended outcome actually happened).	Yes
L2 — Intent compliance (cold-read)	Always. Intent source order: `goals.md` → user-solicited → inferred cold-read.	Yes
L3 — UX heuristics polish	Always observe. Surface findings. Blocks exit only under ship-gate signal.	Conditional
L4 — Opportunistic improvement hunting	On signal (user says "polish/audit/ship-gate/elevate") OR when L2 surfaces ambiguous-intent / can't-form-intent findings.	When triggered

L1 — Functional walk-through

Question: does every feature actually do what it says it does, end-to-end, as a user would use it?

Enumerate every entry point (command, menu item, button, keybinding, automatic trigger)
Walk every primary path to completion
Exercise every state transition (empty → populated, loading → error → recovered, collapsed → expanded, mode A → mode B)
Verify each handler actually fires and produces the intended outcome — not just that the click registered

Evidence: evaluate_in_webview to confirm the action ran (command executed, state updated, event fired). Not "button clicked" — "button clicked AND the result is X." Plus the structural invariants — computed layout, console clean, pixel geometry.

L2 — Intent compliance via cold-read

Question: does the live behavior match what a user expects?

Intent sources, in priority order:

goals.md — under .work/dev/<id>/ or similar. Authoritative when present.
User-solicited intent — if the user has given 1–2 sentences of intent inline this session. Authoritative when provided.
Inferred cold-read — always applied, even when 1/2 are available. Approach the feature as a brand-new user with no context: read labels, icons, tooltips, position, naming, surrounding UI, convention matches. Form an expectation of what this thing should do.

If it doesn't make sense to the model, it's unlikely to make sense to a user. Cold-read failures are real discoverability findings.

Cold-read discipline

Every L2 finding powered by inference must state the inference chain explicitly, so the user can correct it:

Inferred intent: "Share branch via Patch link" (from label "Share" + link icon + position in branch context menu + convention of adjacent Commit/Shelve actions).

Observed: click opens a raw clipboard prompt with no patch formatting.

Divergence: the inferred action is a curated share flow; the actual action is a generic copy.

Classification — every L2 finding is one of:

Type	Severity
Inferred ≠ observed	🟡 Gap. Feature looks like it should do X, actually does Y.
Inferred is ambiguous	🟡 Gap. Surface doesn't telegraph what it does — a new user would be guessing.
Can't form intent at all	🔴 Critical. The model couldn't infer purpose from labels/icons/context. No user will either. Discoverability failure.

L3 — UX heuristics polish

Question: does it feel right?

Apply these heuristics on the running UI (the same set /ux-review uses, observed live):

Feedback & responsiveness — loading/progress, success confirmation, failure visible where the user is looking, transitions smooth not jarring
Discoverability — right surface, findable names, affordances look interactive, disabled elements explain why
Consistency — patterns, terminology, icons, visual language match the rest of GitLens
Workflow integration — flow preservation, reversibility, focus management, no unnecessary interruption
Information design — hierarchy, density, progressive disclosure, helpful empty states
Accessibility — keyboard parity, ARIA labels/roles/states, focus trapping for modals, theme compliance

Measure, don't eyeball — "feels cramped" is not a finding; ".card padding 4px vs repo convention 8–12px" is.

L4 — Opportunistic improvement hunting

Question: could this be meaningfully better?

Triggered by:

User says "audit," "polish," "ship-gate," "elevate" in the invocation
L2 surfaced ambiguous-intent or can't-form-intent findings (fixing those requires proposing changes, which IS L4)
A goals.md is present under .work/dev/<id>/ (suggests scoped feature worth elevating)
Multiple consecutive /live-exercise invocations on the same feature (user is circling — polish time)

Discipline:

Bias toward coherent upgrades that match existing patterns. Small changes that fit intent → dispatch liberty-fix with decisions.md entry. Larger scope-expanding proposals → open-questions.md.
Not drive-by churn. Every improvement must map to a UX heuristic gap, an ambiguous-intent fix, or the feature's stated intent — not just "shuffle things around."

Adaptive depth — no upfront prompt

The skill never asks "what mode?" at start. Tactical by default (L1 + L2 active, L3 observed, L4 off). The lens table above lists L4 triggers; when any fire, comprehensive mode activates automatically. Phase 6 and 7 gate at end-of-loop (see "Convergence" below).

Trivial changes — skip the loop. A single-line copy edit, color-token swap, icon rename, comment-only change, or any change with no state surface earns build + diff + one-click eyeball verification. No findings doc, no Phase 4 dispatch, no convergence check. The skill exists for state-bearing UI work; don't manufacture ceremony for changes that have nothing to converge on. If you can describe the change in one sentence and it touches no handlers, no state, no layout, and no async paths, you're trivial — verify quick and ship.

The loop

1. Scope & spec

Reach starting state without clicking the SUT. Use launch({ settings }), command palette, direct method calls on the component instance, or feature simulators. UI-driven setup conflates the flow you're testing with the state you need — every accidental toggle costs an iteration, and the same icon often does double duty as both setup and SUT. Save real clicks for what you're actually exercising.
Read goals.md under the user's pointed .work/dev/<id>/ (or wherever). If absent, ask the user:

"No goals.md found. In 1–2 sentences: what's the intended user experience for this feature? What problem does it solve, and what does success look like from the user's POV?"

If the user declines to supply intent, proceed on cold-read inference alone (lens 2 still runs).
Launch one Explore subagent to map the feature's code surface. Don't self-read everything up front — read specific files only when you dig into an issue.
Note [FUTURE]/TODO markers — they're known-open, not new findings.

2. Live sweep (lens application)

launch VS Code, then for every distinct mode/state/context the feature exposes:

Maximize real estate via execute_command to close sidebar/aux bar/bottom panel. Do NOT use resize_window to grow the window for general auditing — it resizes the actual Electron window and is clamped by the host display. Only resize when a lens explicitly requires testing a specific responsive breakpoint, and prefer sizes ≤ the launch window. For larger headless render surfaces, configure launch({ screen_resolution }) at startup instead.
Navigate via execute_command or programmatic clicks. Shadow-DOM traversal is the hard part — use evaluate_in_webview to dispatch synthetic MouseEvents with modifier keys when needed. Wrap multi-line scripts in (() => { ... })() and JSON.stringify non-primitive returns. new Error().stack is often empty in eval context — patch the source method to get useful stack frames.
Apply all active lenses per state:
- L1: exercise interactions, verify handlers fired (evaluate_in_webview → command executed / state updated / event dispatched). Capture layout invariants.
- L2: compare live behavior to goals/intent (or cold-read inference). State inference chains explicitly.
- L3: aria_snapshot, computed styles for consistency, check empty/loading/error states explicitly.
- L4 (when active): actively look for coherent improvements.
- Always: read_console { level: "error" } and read_logs per state — errors can be load-order / state-dependent.
Measure, don't eyeball. "Looks off" is not an issue. "AI input at y=1306 in a 1308px viewport" is. "The button does nothing" → verify via evaluate_in_webview that the handler fired and state updated.

3. Compile findings

Write under .tasks/<feature>-exercise/:

findings.md — status table + severity sections.

# <Feature> — Exercise Findings

## Status

| ID     | Lens | Title                                 | Status   | Notes                        |
| ------ | ---- | ------------------------------------- | -------- | ---------------------------- |
| I1-C1  | L1   | Compose panel overflows viewport      | ✅ fixed | `.compose-panel` set to flex |
| I1-G2  | L2   | Compose view missing drafts flow      | open     | goals §3.2                   |
| I1-P1  | L3   | No loading feedback on push           | open     | >1s silent                   |
| I1-E1  | L4   | Add "retry" on transient fetch errors | open     | see Q3                       |
| I1-PR1 | Perf | +180ms hydration on Home view         | open     | baseline 220ms → 400ms       |

## 🔴 Critical (L1 broken / L2 can't-form-intent)

## 🟡 Gap (L2 divergence/ambiguity)

## 🔵 Polish (L3)

## 🟣 Opportunity (L4)

## Perf (see Phase 7)

### I1-C1 — <Title>

- **Lens**: L1 — functional walk-through
- **File**: `path/to/file.ts:L42`
- **Repro**: one-line user action + measurable observation
- **Root cause**: what's wrong in code
- **Fix**: concrete approach, OR "needs design decision (see Q<n>)"

ID scheme: I<iter>-<sev><n> where sev ∈ {C (L1 critical), G (L2 gap), P (L3 polish), E (L4 enhancement), PR (perf regression), PC (perf convention), PS (perf speculation)}. Iteration number comes first so history stays legible.

open-questions.md — decisions the user must make:

## Q1. <Topic>

Brief context (1–2 sentences).

**Option A** — … | **Option B** — … | **Option C** — …

**Recommendation**: A, because …

decisions.md — liberties taken (only created if any):

## D1. Added loading indicator to push action

- **Finding**: I1-P1
- **Rationale**: GitLens convention is `gl-progress-spinner` on any action >500ms. Empty state had none.
- **Change**: `src/webviews/apps/plus/home/push.ts:L88` — added spinner, matches commit-view pattern.
- **Reversible?**: yes, one-line.

Rules:

Every finding cites a file and a repro someone else can reproduce.
Unambiguous bugs become IDs. Design questions become Q-numbers. Liberties become D-numbers.
If in doubt, use AskUserQuestion before filing as a fix.
node_modules issues get documented under "Third-party (won't fix)", not in the critical/warning pile.

4. Dispatch parallel fix agents

For the unambiguous findings AND liberties you've decided to take, dispatch a swarm of background agents — one per logical concern group. Do not hand-fix sequentially.

Grouping:

Related files that must change together → one agent
Independent concerns → separate agents
Cap at 5–6 concurrent; more and coordination breaks down

Each agent prompt MUST include:

Exact file paths to change
Which lens the finding is from (different lenses need different verification)
Root cause (not just symptom)
Measurable user-flow success criteria — e.g. "clicking Share from empty state shows toast AND dispatches gl:share event" (not just "button has handler attached")
Verification command (usually pnpm run build:quick)
Project conventions they'd otherwise miss (.js import extensions, no barrel files, no drive-by refactors, Title Case, no --no-verify)
Explicit out-of-scope ("do NOT touch rebase-entry.ts")
Source-of-truth when relevant (codicons from @vscode/codicons, shared components from src/webviews/apps/shared/components/)

Run agents with run_in_background: true and keep working. The harness notifies on completion.

Subagents will NOT follow `/live-exercise` on their own

Dispatched subagents may not have this skill in their discovery list, and even if they do they'll often skip it under time pressure. The discipline is the main agent's responsibility — not the subagent's. When a subagent's task involves UI verification, the main agent must spell out the live-verification steps explicitly in the prompt, not just refer to this skill.

Bad subagent prompt (will produce static-only verification):

Verify the compare mode isn't broken. We're shipping soon.

Good subagent prompt:

Verify compare mode by: (1) launch mcp__vscode-inspector__launch; (2) show the graph, click one commit, Ctrl+click another; (3) read_console { level: "error" }; (4) inspect that gl-details-multicommit-panel renders with non-empty commitFrom/commitTo; (5) evaluate handler fired via window.__lastCompareEvent; (6) report console errors and any rendering failures. Do NOT ship based on tsc alone.

5. Verify & loop

When agents complete:

pnpm run build:quick. Fix any breakage before proceeding.
Always git diff after agents finish. Don't trust summaries — agents have "completed successfully" while silently removing working code.
Teardown + relaunch VS Code. Extension-host and webview state is sticky across reloads (Shoelace icon cache, Lit registrations, extension singletons).
Re-measure each fix. Evidence, not screenshots.
Re-sweep all active lenses. Watch for:
- Regressions (agent B undid agent A's fix)
- L1 items now pass but L3 items the fix introduced
- Second-order effects (a handler fix exposed a missing loading state)
Update the findings doc — ✅ the resolved items, add new I2-* entries.
Loop to step 4 if there are new unambiguous fixes.

6 & 7 — End-of-loop gates: Simplify, then Performance

Phases 6 and 7 share the same gating: explicit signal ("done/ship-gate/polish/audit") → run automatically; no signal → prompt once, default N. Once opted in for the session, the phase remains part of the convergence loop — no re-prompts on subsequent passes. Either phase producing changes loops back to Phase 5 (re-sweep all active lenses); simplification can introduce regressions (removed "redundant" state that was actually load-bearing, collapsed templates that rendered differently in different contexts), so the re-sweep is non-negotiable.

6. Simplify — invokes /simplify (3 parallel review agents: reuse, quality, efficiency). After it runs: pnpm run build:quick + git diff + loop back to Phase 5.

7. Performance — invokes /live-perf after Phase 6 converges. /live-perf owns scope, measurement discipline, and the three-tier classification (see /live-perf). Perf findings use P-prefix IDs (PR regression, PC convention, PS speculation) and can live in the same findings.md or delegated out.

Convergence — full exit criterion

Exit only when all three of the following hold, in sequence:

A fresh Phase 5 sweep produces nothing new
A fresh Phase 6 simplify (if opted in) produces nothing to simplify
A fresh Phase 7 perf sweep (if opted in) produces nothing measured / no convention violations

If any phase produces changes, loop back to Phase 5. Don't exit because the current iteration produced a few fixes.

Additional exit conditions:

All non-third-party 🔴/🟡/🔵/🟣/Perf items are ✅ or deferred
User has reviewed open-questions.md and decisions.md (if present)
Remaining items are open questions awaiting user input, or user explicitly accepts outstanding items

Pitfalls

Pitfall	What happens	Mitigation
Eyeballing the flow	"Feels fine" becomes evidence; real handler bugs slip through	`evaluate_in_webview` to verify state/command/event fired, not just clicked
Skipping L2 when no goals.md	Completeness findings silently missed	Cold-read always applies; solicit intent if absent; state inference chains explicitly
Polish dominating bugs	Iteration focuses on spacing while L1 handlers are broken	L1 is gating. Don't move deeper until L1 is clean
Drive-by improvement churn (L4)	Agent proposes changes that don't serve intent	Every improvement must map to a heuristic gap, an L2 ambiguity, or stated intent
Liberty overreach	Agent changes a cross-feature convention (icon, term, pattern) without asking	Cross-feature changes → `open-questions.md` always
Missing `decisions.md`	User has no audit trail of liberties; trust erodes	Every dispatched liberty goes in `decisions.md`
Speculative perf optimization	Agent "optimizes" something that was never slow	Measured-tier requires baseline + post-measurement. Speculation → open-questions only
Perf rewrite when caching fits	Agent rewrites an inner loop when a missing `@memoize` would solve it	Prefer caching / memoization / debouncing over rewrites; convention-tier handles most
Over-eager fix agent	Prompt says "remove empty div" → agent deletes functional buttons inside	Prompt MUST state template location and expected rendering outcome, not only CSS selector
Load-order bug	Side-effect module runs too late; the thing it registers fires after consumers already used the default	Register at every webview entry, or in the most-transitively-imported shared wrapper
Per-realm registration	One webview ≠ all webviews	Plan for N registrations, not 1
Stale instance state	Iteration N sees cached state from N−1	Always teardown + relaunch between iterations
Silent agent regression	Agent "succeeds" but reverted working code	Always `git diff` after agents; don't trust their summaries
Mimic vs reuse	Agent "fixes" icon-fetch by inlining third-party icons instead of using ours	Specify source-of-truth in the prompt
Shadow DOM aria clipping	`aria_snapshot` reports empty buttons that actually have labels	Use `evaluate_in_webview` to read `aria-label` on the inner button directly
Guess-fixing	"Probably the padding — let me bump it"	Measure first. No measurement → no fix
Setup-by-clicking	The icon you click to reach a state is often also the SUT — accidental toggles cost iterations and mask root causes	Drive setup via `launch({ settings })`, command palette, direct method calls, or simulators. Save UI clicks for what you're actually testing
Snapshot at the consumer	"Capture state on the click handler" / "in the bubble listener" fails when state mutates upstream — different task, microtask drained, stopPropagation, Lit re-render between	Instrument inside the function that mutates the state, not in a downstream listener. The producer always runs first by definition
Stacked instrumentation	Re-running a tracing patch wraps the previous wrapper; one call now logs N times, stacks fragment, traces stop making sense	Guard each patch with a marker (`if (a.__patched) return; a.__patched = true`) or teardown+relaunch. Webview reload keeps document listeners — only teardown is a clean reset
Spec-grade mental model	"Microtasks drain at task end" / "capture before bubble" assumed without verification — Playwright dispatches input events as separate tasks, microtasks drain between them	For any timing/ordering-sensitive design, prove the mechanism with a 5-line `evaluate_in_webview` trace before writing the plan

Red flags — pause the loop

You're about to answer a "does this actually work for a user?" question without having exercised a handler end-to-end
You're about to apply L3/L4 before L1 is fully clean
You haven't teardown + relaunched between iterations
You're about to fix something without a measurable repro
An agent's summary says "done" but you haven't diffed the actual changes
You're on iteration 4+ and the same issue keeps resurfacing — stop; a design decision is likely required, ask the user
You feel the urge to hand-fix "just this one" instead of dispatching an agent
You're taking a liberty on a cross-feature convention without asking
No goals.md exists AND you haven't run the cold-read discipline
You're about to dispatch a perf fix without a baseline measurement

Tripwires for "skipping live lens application"

Any of these phrases appearing in your reasoning = you're about to ship without live verification. Stop and launch the instance.

"I have enough confidence"
"Code paths trace cleanly"
"Types check / tsc passed / no TypeScript errors"
"git diff looks fine"
"It should work based on the code"
"No obvious issues in the code"
"L1 passed, skip L2–L4"
"Polish is subjective, move on"
"There's no goals.md so I'll guess at intent"
"This could probably be faster" (dispatch bait)
"The fix should work because [microtask timing / event ordering / shadow DOM behavior]" — that's a hypothesis, not a verification. Trace the actual ordering live before committing to a design
"Ship it" / "LGTM" / "looks good from here"

These are correct outputs AFTER applying lenses live, not substitutes for it.

Rationalizations to resist

Excuse	Reality
"I have enough confidence from reading the code"	Reading code catches logic bugs. It misses layout overflow, mode-switching state bugs, console errors, runtime z-index, every pixel-accurate issue. Launch the instance
"tsc passed, types trace cleanly, `git diff` is small — ship it"	Types and diffs are blind to UI. They're necessary, not sufficient
"L3/L4 are subjective — skip"	Subjective lenses are exactly what this skill exists to apply deliberately. Skipping them defeats the point
"No goals.md, so heuristics alone are fine"	Cold-read still applies and catches discoverability failures heuristics don't. State the inference chain
"We're shipping in 30 min, skip the live check this once"	30 seconds of live inspection catches what 30 min of static analysis cannot
"It's one small issue, no need for an agent"	Single fixes still benefit from the dispatch pattern — you keep doing live inspection while the agent works, and the diff surfaces regressions
"Improvement opportunity obviously right, no need for decisions.md"	Clear wins are exactly what `decisions.md` is for — the audit trail is the whole point of "taking liberties"
"We just did a sweep, skip iteration 2"	Iteration 2 catches regressions iteration 1 couldn't see
"Spec says microtasks drain at task end, so the snapshot works"	Native events dispatched by real input (Playwright) split into separate tasks; microtasks drain between them. Trace the actual ordering before designing around it
"I'll capture state on the click handler"	Click runs in a later task. Anything mutated in pointerdown's microtask is gone by then. Snapshot in the producer (where the mutation is scheduled), not the consumer
"Setup by clicking the icon I want to test is fine"	The same icon often does double duty as setup and SUT — accidental toggles are silent and burn iterations. Drive setup via config/command/method calls, not the SUT
"Teardown takes 10s each time — skip it"	Cached state in the old instance is the single most common source of false-positive/false-negative findings
"The build passed, ship it"	Build passing ≠ fix verified. Measure
"This could probably be faster"	Speculation. No measurement → open-questions, never dispatch
"Missed `@memoize` — I'll file it and move on"	Convention-tier dispatches on audit alone. Fix it; log in `decisions.md`
"Sweep + simplify converged, good enough"	Three-way convergence: sweep + simplify + perf all stable. If perf wasn't run, prompt for it before declaring done

Before declaring "live-exercise complete"

You MUST have:

Launched the extension live (not just pnpm run build:quick)
Exercised each changed mode/state at least once — click, multi-select, toggle, whatever produces the surface
Verified L1 handlers actually fired — not just that buttons registered clicks
Applied L2 cold-read — stated inference chains explicitly for any intent-sourced finding
Read the console for errors — read_console { level: "error" } once per mode
Written decisions.md for every liberty taken
Surfaced open-questions.md to the user
Completed three-way convergence — sweep + simplify (if opted in) + perf (if opted in) all produce nothing new in sequence

Missing any of those = UI readiness is unverified. "Code paths look clean" is not evidence.

Exercising Pro-gated features

Features gated by subscription (Commit Graph beyond local repos, Launchpad, Worktrees, Cloud Patches, Composer, all AI features, Drafts, Workspaces, etc.) won't unlock without a Paid/Trial session. Use the subscription simulator documented in /live-inspect (section: "Exercising Pro-gated features"). Pass dismissOnboarding: true on the start call to pre-dismiss every GitLens onboarding overlay — they intercept clicks during automation. State (subscription + onboarding flags) is restored when you call with state: null.

execute_command { command: "gitlens.plus.simulate.subscription", args: [{ "state": "Paid", "planId": "pro", "dismissOnboarding": true }] }

Other states for boundary-case sweeps: "Community" (paywall UX), "TrialExpired" / "TrialReactivationEligible" (trial-end UX), "VerificationRequired" (email-verify gate), or "Paid" with planId: "advanced" | "teams" | "enterprise" for plan-tier differences. See /live-inspect for the full reference.

Exercising AI features

Real AI provider calls are non-deterministic and can't be asserted against. Use the AI simulator documented in /live-inspect (section: "Exercising AI features") — it's Pro-gated, so the subscription simulator above is a prerequisite.

Loop-specific notes:

Per finding: inject content with a unique assertion sentinel before triggering the feature; assert against it in the rendered surface. Clear between findings ({ op: "clear" }) — leftover injects leak between scenarios.
Negative-path findings (error/cancel/slow/invalid UX): switch the global mode instead of injecting per call.
Phase 4 dispatch: subagents don't auto-load /live-inspect. When dispatching a fix agent that touches an AI surface, embed the inject command, the content to inject, and the assertion target directly in the agent's prompt.

Output artifacts

.tasks/<feature>-exercise/findings.md — severity + status table + detailed entries across lenses + perf
.tasks/<feature>-exercise/open-questions.md — design decisions needing user input
.tasks/<feature>-exercise/decisions.md — liberties taken, with rationale (only if any)
Clean working tree with targeted commits (or staged changes) across many files
Clean live console (only third-party noise)

Related skills

REQUIRED BACKGROUND:

/live-inspect — primitive MCP tool reference, used throughout
/simplify — 3-agent parallel code cleanup; invoked from Phase 6
/live-perf — perf measurement + improvement; invoked from Phase 7

Interactive counterpart:

/live-pair — user-driven pair-programming. Use when iteration is exploratory/creative; delegates back here on structural bugs.

Static counterparts:

/ux-review — static L2/L3 counterpart
/deep-review — static L1 correctness counterpart
/review — standards + completeness diff review

name

live-exercise

description

/live-exercise — Live functionality, intent, and quality loop

This is the normal working rhythm for UI-bearing work — from tactical bug-fix iterations through ship-gate audits. Adaptive depth; no upfront prompt.

When to use vs other skills

Skill	Purpose	Mode
`/review`	Standards + completeness checklist	Static
`/deep-review`	Correctness tracing end-to-end	Static
`/ux-review`	User flows against a goals doc	Static
`/live-inspect`	Reference for `vscode-inspector` MCP tools	Primitive
`/live-exercise`	Live operation + audit + fix loop	Live + iterative

Use /live-exercise any time you touch UI — adding a panel, fixing a mode, refactoring a webview, ship-gating a feature. Not as a one-off audit; as the default way to work on UI.

Prerequisites

vscode-inspector MCP connected (auto-discovered via .mcp.json)
Build currently passes (pnpm run build:quick)

Lenses — applied every sweep

Lens	When applied	Blocking?
L1 — Functional walk-through	Always. Layout, console, pixel invariants + handler verification (exercise the action, verify the intended outcome actually happened).	Yes
L2 — Intent compliance (cold-read)	Always. Intent source order: `goals.md` → user-solicited → inferred cold-read.	Yes
L3 — UX heuristics polish	Always observe. Surface findings. Blocks exit only under ship-gate signal.	Conditional
L4 — Opportunistic improvement hunting	On signal (user says "polish/audit/ship-gate/elevate") OR when L2 surfaces ambiguous-intent / can't-form-intent findings.	When triggered

L1 — Functional walk-through

Question: does every feature actually do what it says it does, end-to-end, as a user would use it?

Enumerate every entry point (command, menu item, button, keybinding, automatic trigger)
Walk every primary path to completion
Exercise every state transition (empty → populated, loading → error → recovered, collapsed → expanded, mode A → mode B)
Verify each handler actually fires and produces the intended outcome — not just that the click registered

L2 — Intent compliance via cold-read

Question: does the live behavior match what a user expects?

Intent sources, in priority order:

goals.md — under .work/dev/<id>/ or similar. Authoritative when present.
User-solicited intent — if the user has given 1–2 sentences of intent inline this session. Authoritative when provided.
Inferred cold-read — always applied, even when 1/2 are available. Approach the feature as a brand-new user with no context: read labels, icons, tooltips, position, naming, surrounding UI, convention matches. Form an expectation of what this thing should do.

If it doesn't make sense to the model, it's unlikely to make sense to a user. Cold-read failures are real discoverability findings.

Cold-read discipline

Every L2 finding powered by inference must state the inference chain explicitly, so the user can correct it:

Inferred intent: "Share branch via Patch link" (from label "Share" + link icon + position in branch context menu + convention of adjacent Commit/Shelve actions).

Observed: click opens a raw clipboard prompt with no patch formatting.

Divergence: the inferred action is a curated share flow; the actual action is a generic copy.

Classification — every L2 finding is one of:

Type	Severity
Inferred ≠ observed	🟡 Gap. Feature looks like it should do X, actually does Y.
Inferred is ambiguous	🟡 Gap. Surface doesn't telegraph what it does — a new user would be guessing.
Can't form intent at all	🔴 Critical. The model couldn't infer purpose from labels/icons/context. No user will either. Discoverability failure.

L3 — UX heuristics polish

Question: does it feel right?

Apply these heuristics on the running UI (the same set /ux-review uses, observed live):

Feedback & responsiveness — loading/progress, success confirmation, failure visible where the user is looking, transitions smooth not jarring
Discoverability — right surface, findable names, affordances look interactive, disabled elements explain why
Consistency — patterns, terminology, icons, visual language match the rest of GitLens
Workflow integration — flow preservation, reversibility, focus management, no unnecessary interruption
Information design — hierarchy, density, progressive disclosure, helpful empty states
Accessibility — keyboard parity, ARIA labels/roles/states, focus trapping for modals, theme compliance

Measure, don't eyeball — "feels cramped" is not a finding; ".card padding 4px vs repo convention 8–12px" is.

L4 — Opportunistic improvement hunting

Question: could this be meaningfully better?

Triggered by:

User says "audit," "polish," "ship-gate," "elevate" in the invocation
L2 surfaced ambiguous-intent or can't-form-intent findings (fixing those requires proposing changes, which IS L4)
A goals.md is present under .work/dev/<id>/ (suggests scoped feature worth elevating)
Multiple consecutive /live-exercise invocations on the same feature (user is circling — polish time)

Discipline:

Bias toward coherent upgrades that match existing patterns. Small changes that fit intent → dispatch liberty-fix with decisions.md entry. Larger scope-expanding proposals → open-questions.md.
Not drive-by churn. Every improvement must map to a UX heuristic gap, an ambiguous-intent fix, or the feature's stated intent — not just "shuffle things around."

Adaptive depth — no upfront prompt

The loop

1. Scope & spec

Reach starting state without clicking the SUT. Use launch({ settings }), command palette, direct method calls on the component instance, or feature simulators. UI-driven setup conflates the flow you're testing with the state you need — every accidental toggle costs an iteration, and the same icon often does double duty as both setup and SUT. Save real clicks for what you're actually exercising.
Read goals.md under the user's pointed .work/dev/<id>/ (or wherever). If absent, ask the user:

"No goals.md found. In 1–2 sentences: what's the intended user experience for this feature? What problem does it solve, and what does success look like from the user's POV?"

If the user declines to supply intent, proceed on cold-read inference alone (lens 2 still runs).
Launch one Explore subagent to map the feature's code surface. Don't self-read everything up front — read specific files only when you dig into an issue.
Note [FUTURE]/TODO markers — they're known-open, not new findings.

2. Live sweep (lens application)

launch VS Code, then for every distinct mode/state/context the feature exposes:

Maximize real estate via execute_command to close sidebar/aux bar/bottom panel. Do NOT use resize_window to grow the window for general auditing — it resizes the actual Electron window and is clamped by the host display. Only resize when a lens explicitly requires testing a specific responsive breakpoint, and prefer sizes ≤ the launch window. For larger headless render surfaces, configure launch({ screen_resolution }) at startup instead.
Navigate via execute_command or programmatic clicks. Shadow-DOM traversal is the hard part — use evaluate_in_webview to dispatch synthetic MouseEvents with modifier keys when needed. Wrap multi-line scripts in (() => { ... })() and JSON.stringify non-primitive returns. new Error().stack is often empty in eval context — patch the source method to get useful stack frames.
Apply all active lenses per state:
- L1: exercise interactions, verify handlers fired (evaluate_in_webview → command executed / state updated / event dispatched). Capture layout invariants.
- L2: compare live behavior to goals/intent (or cold-read inference). State inference chains explicitly.
- L3: aria_snapshot, computed styles for consistency, check empty/loading/error states explicitly.
- L4 (when active): actively look for coherent improvements.
- Always: read_console { level: "error" } and read_logs per state — errors can be load-order / state-dependent.
Measure, don't eyeball. "Looks off" is not an issue. "AI input at y=1306 in a 1308px viewport" is. "The button does nothing" → verify via evaluate_in_webview that the handler fired and state updated.

3. Compile findings

Write under .tasks/<feature>-exercise/:

findings.md — status table + severity sections.

# <Feature> — Exercise Findings

## Status

| ID     | Lens | Title                                 | Status   | Notes                        |
| ------ | ---- | ------------------------------------- | -------- | ---------------------------- |
| I1-C1  | L1   | Compose panel overflows viewport      | ✅ fixed | `.compose-panel` set to flex |
| I1-G2  | L2   | Compose view missing drafts flow      | open     | goals §3.2                   |
| I1-P1  | L3   | No loading feedback on push           | open     | >1s silent                   |
| I1-E1  | L4   | Add "retry" on transient fetch errors | open     | see Q3                       |
| I1-PR1 | Perf | +180ms hydration on Home view         | open     | baseline 220ms → 400ms       |

## 🔴 Critical (L1 broken / L2 can't-form-intent)

## 🟡 Gap (L2 divergence/ambiguity)

## 🔵 Polish (L3)

## 🟣 Opportunity (L4)

## Perf (see Phase 7)

### I1-C1 — <Title>

- **Lens**: L1 — functional walk-through
- **File**: `path/to/file.ts:L42`
- **Repro**: one-line user action + measurable observation
- **Root cause**: what's wrong in code
- **Fix**: concrete approach, OR "needs design decision (see Q<n>)"

open-questions.md — decisions the user must make:

## Q1. <Topic>

Brief context (1–2 sentences).

**Option A** — … | **Option B** — … | **Option C** — …

**Recommendation**: A, because …

decisions.md — liberties taken (only created if any):

## D1. Added loading indicator to push action

- **Finding**: I1-P1
- **Rationale**: GitLens convention is `gl-progress-spinner` on any action >500ms. Empty state had none.
- **Change**: `src/webviews/apps/plus/home/push.ts:L88` — added spinner, matches commit-view pattern.
- **Reversible?**: yes, one-line.

Rules:

Every finding cites a file and a repro someone else can reproduce.
Unambiguous bugs become IDs. Design questions become Q-numbers. Liberties become D-numbers.
If in doubt, use AskUserQuestion before filing as a fix.
node_modules issues get documented under "Third-party (won't fix)", not in the critical/warning pile.

4. Dispatch parallel fix agents

For the unambiguous findings AND liberties you've decided to take, dispatch a swarm of background agents — one per logical concern group. Do not hand-fix sequentially.

Grouping:

Related files that must change together → one agent
Independent concerns → separate agents
Cap at 5–6 concurrent; more and coordination breaks down

Each agent prompt MUST include:

Exact file paths to change
Which lens the finding is from (different lenses need different verification)
Root cause (not just symptom)
Measurable user-flow success criteria — e.g. "clicking Share from empty state shows toast AND dispatches gl:share event" (not just "button has handler attached")
Verification command (usually pnpm run build:quick)
Project conventions they'd otherwise miss (.js import extensions, no barrel files, no drive-by refactors, Title Case, no --no-verify)
Explicit out-of-scope ("do NOT touch rebase-entry.ts")
Source-of-truth when relevant (codicons from @vscode/codicons, shared components from src/webviews/apps/shared/components/)

Run agents with run_in_background: true and keep working. The harness notifies on completion.

Subagents will NOT follow `/live-exercise` on their own

Bad subagent prompt (will produce static-only verification):

Verify the compare mode isn't broken. We're shipping soon.

Good subagent prompt:

Verify compare mode by: (1) launch mcp__vscode-inspector__launch; (2) show the graph, click one commit, Ctrl+click another; (3) read_console { level: "error" }; (4) inspect that gl-details-multicommit-panel renders with non-empty commitFrom/commitTo; (5) evaluate handler fired via window.__lastCompareEvent; (6) report console errors and any rendering failures. Do NOT ship based on tsc alone.

5. Verify & loop

When agents complete:

pnpm run build:quick. Fix any breakage before proceeding.
Always git diff after agents finish. Don't trust summaries — agents have "completed successfully" while silently removing working code.
Teardown + relaunch VS Code. Extension-host and webview state is sticky across reloads (Shoelace icon cache, Lit registrations, extension singletons).
Re-measure each fix. Evidence, not screenshots.
Re-sweep all active lenses. Watch for:
- Regressions (agent B undid agent A's fix)
- L1 items now pass but L3 items the fix introduced
- Second-order effects (a handler fix exposed a missing loading state)
Update the findings doc — ✅ the resolved items, add new I2-* entries.
Loop to step 4 if there are new unambiguous fixes.

6 & 7 — End-of-loop gates: Simplify, then Performance

6. Simplify — invokes /simplify (3 parallel review agents: reuse, quality, efficiency). After it runs: pnpm run build:quick + git diff + loop back to Phase 5.

Convergence — full exit criterion

Exit only when all three of the following hold, in sequence:

A fresh Phase 5 sweep produces nothing new
A fresh Phase 6 simplify (if opted in) produces nothing to simplify
A fresh Phase 7 perf sweep (if opted in) produces nothing measured / no convention violations

If any phase produces changes, loop back to Phase 5. Don't exit because the current iteration produced a few fixes.

Additional exit conditions:

All non-third-party 🔴/🟡/🔵/🟣/Perf items are ✅ or deferred
User has reviewed open-questions.md and decisions.md (if present)
Remaining items are open questions awaiting user input, or user explicitly accepts outstanding items

Pitfalls

Pitfall	What happens	Mitigation
Eyeballing the flow	"Feels fine" becomes evidence; real handler bugs slip through	`evaluate_in_webview` to verify state/command/event fired, not just clicked
Skipping L2 when no goals.md	Completeness findings silently missed	Cold-read always applies; solicit intent if absent; state inference chains explicitly
Polish dominating bugs	Iteration focuses on spacing while L1 handlers are broken	L1 is gating. Don't move deeper until L1 is clean
Drive-by improvement churn (L4)	Agent proposes changes that don't serve intent	Every improvement must map to a heuristic gap, an L2 ambiguity, or stated intent
Liberty overreach	Agent changes a cross-feature convention (icon, term, pattern) without asking	Cross-feature changes → `open-questions.md` always
Missing `decisions.md`	User has no audit trail of liberties; trust erodes	Every dispatched liberty goes in `decisions.md`
Speculative perf optimization	Agent "optimizes" something that was never slow	Measured-tier requires baseline + post-measurement. Speculation → open-questions only
Perf rewrite when caching fits	Agent rewrites an inner loop when a missing `@memoize` would solve it	Prefer caching / memoization / debouncing over rewrites; convention-tier handles most
Over-eager fix agent	Prompt says "remove empty div" → agent deletes functional buttons inside	Prompt MUST state template location and expected rendering outcome, not only CSS selector
Load-order bug	Side-effect module runs too late; the thing it registers fires after consumers already used the default	Register at every webview entry, or in the most-transitively-imported shared wrapper
Per-realm registration	One webview ≠ all webviews	Plan for N registrations, not 1
Stale instance state	Iteration N sees cached state from N−1	Always teardown + relaunch between iterations
Silent agent regression	Agent "succeeds" but reverted working code	Always `git diff` after agents; don't trust their summaries
Mimic vs reuse	Agent "fixes" icon-fetch by inlining third-party icons instead of using ours	Specify source-of-truth in the prompt
Shadow DOM aria clipping	`aria_snapshot` reports empty buttons that actually have labels	Use `evaluate_in_webview` to read `aria-label` on the inner button directly
Guess-fixing	"Probably the padding — let me bump it"	Measure first. No measurement → no fix
Setup-by-clicking	The icon you click to reach a state is often also the SUT — accidental toggles cost iterations and mask root causes	Drive setup via `launch({ settings })`, command palette, direct method calls, or simulators. Save UI clicks for what you're actually testing
Snapshot at the consumer	"Capture state on the click handler" / "in the bubble listener" fails when state mutates upstream — different task, microtask drained, stopPropagation, Lit re-render between	Instrument inside the function that mutates the state, not in a downstream listener. The producer always runs first by definition
Stacked instrumentation	Re-running a tracing patch wraps the previous wrapper; one call now logs N times, stacks fragment, traces stop making sense	Guard each patch with a marker (`if (a.__patched) return; a.__patched = true`) or teardown+relaunch. Webview reload keeps document listeners — only teardown is a clean reset
Spec-grade mental model	"Microtasks drain at task end" / "capture before bubble" assumed without verification — Playwright dispatches input events as separate tasks, microtasks drain between them	For any timing/ordering-sensitive design, prove the mechanism with a 5-line `evaluate_in_webview` trace before writing the plan

Red flags — pause the loop

You're about to answer a "does this actually work for a user?" question without having exercised a handler end-to-end
You're about to apply L3/L4 before L1 is fully clean
You haven't teardown + relaunched between iterations
You're about to fix something without a measurable repro
An agent's summary says "done" but you haven't diffed the actual changes
You're on iteration 4+ and the same issue keeps resurfacing — stop; a design decision is likely required, ask the user
You feel the urge to hand-fix "just this one" instead of dispatching an agent
You're taking a liberty on a cross-feature convention without asking
No goals.md exists AND you haven't run the cold-read discipline
You're about to dispatch a perf fix without a baseline measurement

Tripwires for "skipping live lens application"

Any of these phrases appearing in your reasoning = you're about to ship without live verification. Stop and launch the instance.

"I have enough confidence"
"Code paths trace cleanly"
"Types check / tsc passed / no TypeScript errors"
"git diff looks fine"
"It should work based on the code"
"No obvious issues in the code"
"L1 passed, skip L2–L4"
"Polish is subjective, move on"
"There's no goals.md so I'll guess at intent"
"This could probably be faster" (dispatch bait)
"The fix should work because [microtask timing / event ordering / shadow DOM behavior]" — that's a hypothesis, not a verification. Trace the actual ordering live before committing to a design
"Ship it" / "LGTM" / "looks good from here"

These are correct outputs AFTER applying lenses live, not substitutes for it.

Rationalizations to resist

Excuse	Reality
"I have enough confidence from reading the code"	Reading code catches logic bugs. It misses layout overflow, mode-switching state bugs, console errors, runtime z-index, every pixel-accurate issue. Launch the instance
"tsc passed, types trace cleanly, `git diff` is small — ship it"	Types and diffs are blind to UI. They're necessary, not sufficient
"L3/L4 are subjective — skip"	Subjective lenses are exactly what this skill exists to apply deliberately. Skipping them defeats the point
"No goals.md, so heuristics alone are fine"	Cold-read still applies and catches discoverability failures heuristics don't. State the inference chain
"We're shipping in 30 min, skip the live check this once"	30 seconds of live inspection catches what 30 min of static analysis cannot
"It's one small issue, no need for an agent"	Single fixes still benefit from the dispatch pattern — you keep doing live inspection while the agent works, and the diff surfaces regressions
"Improvement opportunity obviously right, no need for decisions.md"	Clear wins are exactly what `decisions.md` is for — the audit trail is the whole point of "taking liberties"
"We just did a sweep, skip iteration 2"	Iteration 2 catches regressions iteration 1 couldn't see
"Spec says microtasks drain at task end, so the snapshot works"	Native events dispatched by real input (Playwright) split into separate tasks; microtasks drain between them. Trace the actual ordering before designing around it
"I'll capture state on the click handler"	Click runs in a later task. Anything mutated in pointerdown's microtask is gone by then. Snapshot in the producer (where the mutation is scheduled), not the consumer
"Setup by clicking the icon I want to test is fine"	The same icon often does double duty as setup and SUT — accidental toggles are silent and burn iterations. Drive setup via config/command/method calls, not the SUT
"Teardown takes 10s each time — skip it"	Cached state in the old instance is the single most common source of false-positive/false-negative findings
"The build passed, ship it"	Build passing ≠ fix verified. Measure
"This could probably be faster"	Speculation. No measurement → open-questions, never dispatch
"Missed `@memoize` — I'll file it and move on"	Convention-tier dispatches on audit alone. Fix it; log in `decisions.md`
"Sweep + simplify converged, good enough"	Three-way convergence: sweep + simplify + perf all stable. If perf wasn't run, prompt for it before declaring done

Before declaring "live-exercise complete"

You MUST have:

Launched the extension live (not just pnpm run build:quick)
Exercised each changed mode/state at least once — click, multi-select, toggle, whatever produces the surface
Verified L1 handlers actually fired — not just that buttons registered clicks
Applied L2 cold-read — stated inference chains explicitly for any intent-sourced finding
Read the console for errors — read_console { level: "error" } once per mode
Written decisions.md for every liberty taken
Surfaced open-questions.md to the user
Completed three-way convergence — sweep + simplify (if opted in) + perf (if opted in) all produce nothing new in sequence

Missing any of those = UI readiness is unverified. "Code paths look clean" is not evidence.

Exercising Pro-gated features

execute_command { command: "gitlens.plus.simulate.subscription", args: [{ "state": "Paid", "planId": "pro", "dismissOnboarding": true }] }

Exercising AI features

Loop-specific notes:

Per finding: inject content with a unique assertion sentinel before triggering the feature; assert against it in the rendered surface. Clear between findings ({ op: "clear" }) — leftover injects leak between scenarios.
Negative-path findings (error/cancel/slow/invalid UX): switch the global mode instead of injecting per call.
Phase 4 dispatch: subagents don't auto-load /live-inspect. When dispatching a fix agent that touches an AI surface, embed the inject command, the content to inject, and the assertion target directly in the agent's prompt.

Output artifacts

.tasks/<feature>-exercise/findings.md — severity + status table + detailed entries across lenses + perf
.tasks/<feature>-exercise/open-questions.md — design decisions needing user input
.tasks/<feature>-exercise/decisions.md — liberties taken, with rationale (only if any)
Clean working tree with targeted commits (or staged changes) across many files
Clean live console (only third-party noise)

Related skills

REQUIRED BACKGROUND:

/live-inspect — primitive MCP tool reference, used throughout
/simplify — 3-agent parallel code cleanup; invoked from Phase 6
/live-perf — perf measurement + improvement; invoked from Phase 7

Interactive counterpart:

/live-pair — user-driven pair-programming. Use when iteration is exploratory/creative; delegates back here on structural bugs.

Static counterparts:

/ux-review — static L2/L3 counterpart
/deep-review — static L1 correctness counterpart
/review — standards + completeness diff review

live-exercise

Más de este repositorio

Más de este repositorio

/live-exercise — Live functionality, intent, and quality loop

When to use vs other skills

Prerequisites

Lenses — applied every sweep

L1 — Functional walk-through

L2 — Intent compliance via cold-read

Cold-read discipline

L3 — UX heuristics polish

L4 — Opportunistic improvement hunting

Adaptive depth — no upfront prompt

The loop

1. Scope & spec

2. Live sweep (lens application)

3. Compile findings

4. Dispatch parallel fix agents

Subagents will NOT follow /live-exercise on their own

5. Verify & loop

6 & 7 — End-of-loop gates: Simplify, then Performance

Convergence — full exit criterion

Pitfalls

Red flags — pause the loop

Tripwires for "skipping live lens application"

Rationalizations to resist

Before declaring "live-exercise complete"

Exercising Pro-gated features

Exercising AI features

Output artifacts

Related skills

/live-exercise — Live functionality, intent, and quality loop

When to use vs other skills

Prerequisites

Lenses — applied every sweep

L1 — Functional walk-through

L2 — Intent compliance via cold-read

Cold-read discipline

L3 — UX heuristics polish

L4 — Opportunistic improvement hunting

Adaptive depth — no upfront prompt

The loop

1. Scope & spec

2. Live sweep (lens application)

3. Compile findings

4. Dispatch parallel fix agents

Subagents will NOT follow /live-exercise on their own

5. Verify & loop

6 & 7 — End-of-loop gates: Simplify, then Performance

Convergence — full exit criterion

Pitfalls

Red flags — pause the loop

Tripwires for "skipping live lens application"

Rationalizations to resist

Before declaring "live-exercise complete"

Exercising Pro-gated features

Exercising AI features

Output artifacts

Related skills

Subagents will NOT follow `/live-exercise` on their own

Subagents will NOT follow `/live-exercise` on their own