원클릭으로 Manus에서 모든 스킬 실행

prove-it-live-before-done

Use whenever an agent (or you) is about to claim work is done/fixed/shipped/passing, or when Mark says 'ready?', 'done?', 'did it work?', 'still failing', 'is the live URL responding?', 'check the deploy status', or wants to confirm a fix before pushing. Treat every completion claim, green CI, and passing test as UNPROVEN until the real artifact is exercised end-to-end — drive the actual running app at its real URL/UI/API, confirm the deployed revision is actually live, verify the mutating side-effect (email/queue/DB row) truly fired, gate every forward step on an explicit green signal, audit coverage across the whole surface, and reconcile against the system of record. Then name the specific residual defect with expected-vs-actual.

Manus에서 실행

개요

설치 명령

npx skills add https://github.com/artsmc/claude-dev-agents --skill prove-it-live-before-done

이 명령을 Claude Code에 복사하여 붙여넣어 스킬을 설치하세요

출처

artsmc/claude-dev-agents

스타3

포크0

업데이트2026년 5월 30일 12:10

파일 탐색기

2 개 파일

SKILL.md

readonly

name

prove-it-live-before-done

description

Prove It Live Before Done

Mark's strongest reflex: he refuses to accept 'done/fixed/shipped/tests pass' on your word. He drives the real product himself and names what's still broken (that worked but now the skills path dont reflect the calls properly; got the confirmation but no email sent). Adopt his skepticism as the default — your job is to DISPROVE the completion claim, not announce it.

Core rule: a claim is not proof. Exercise the real thing.

A UI confirmation toast is NOT proof the email sent. A green workflow is NOT proof the new revision is live. A passing unit test is NOT proof the feature works through its real path. Map each claim to the right proxy and check THAT:

Claim being made	What you must actually do to prove it
"The fix works"	Re-run the real user action / construct a deliberate failing case; confirm the exact prior symptom is gone
"Feature is built"	Drive it at its real URL/UI/API as a user would — `start the application in the background` and use it
"Email/notification sent"	Check the actual inbox/provider, not the success toast — `got the confirmation but no email sent: artsmc88@gmail.com`
"Uploaded / queued / saved"	Inspect the queue depth / the DB row — `i uploaded 3 and they are all sitting in the queue`
"Deployed / shipped"	Hit the live URL AND confirm the new ECS taskDef revision / Render dashboard shows the running version by exact version number
"Migration done"	Confirm the migration actually RAN, not just that code changed — `run the migration`
"Tests pass / CI green"	Confirm the gate ran on a machine that actually has node_modules; check coverage hits the bar
"PR is done"	Read the actual diff, not the PR summary — `is the code inside the PR...`; check it hasn't drifted from main

Step 1: Gate every forward step on a green, machine-checkable signal

Never merge, deploy, scale, or advance a phase on red or pending. Always specify the branch for both outcomes — Mark does this reflexively:

merge it if green / merge the green ones
Gate PASS → mark done, write next brief. Gate FAIL → write fixup, dispatch on desktop main, schedule wake
If all 5 checks are green: stop the loop, then gh pr merge. Else resolve and re-run.

If the signal is pending, do not proceed — poll (see fleet-dispatch-and-watch) or report and wait.

Step 2: Require a multi-signal definition of done

One signal is never enough. Mark defines done by the end-to-end outcome:

Report file exists AND commits present AND gate passes on a real machine
Agent says DONE AND its branch is on origin (when sen-259 shows DONE and its branch is on origin)
Code changed AND migration actually run
Feature exists AND it exercises the intended mechanism (the right path, not a stub)
Coverage meets the numeric bar (we need at least 90%)

Step 3: Confirm the deploy is actually LIVE

A release is incomplete until the deployed artifact is confirmed in production — not when the workflow says success.

https://chat.beta.sentienthome.ai/ live? — hit it.
check the v1.5.11 deploy status — by exact version, across iterations.
Confirm the new ECS taskDef revision is running / the Render dashboard shows the version.
If passive polling stalls, switch to the authoritative dashboard (check the render dashboard).

Step 4: Confirm a mutating command actually took effect

After any script/command/correction that changes external state, do not assume success. did it work? — paste the actual execution output and verify it. Cross-check deployment integrity against an external source (the commit URL) to confirm the specific changes shipped. did we confirm the change when we ran the script?

Step 5: Confirm the chain actually advanced

A completion signal does not imply the next stage fired. Probe it: does that mean the deploy started?, have we pushed a tagged version of the code?, is that pino work already merged in?. When multiple actors did work, demand explicit confirmation: i had another runner doing work — can you absolutely confirm nothing is live?

Step 6: Audit coverage across the WHOLE surface

Don't accept that a concern exists 'somewhere'. Sweep it:

do we have pino log coverage across the app? — all apps, all containers.
i need that 100% resolved across apps before moving to the next thing
After fixing one layer, check downstream holds separate state: luke is unable to login to the monorepo — can we confirm they are also the same in RDS?
Check adjacent/parallel categories that may have been missed.

Step 7: Reconcile against the system of record

Distrust secondary tracking state. Before accepting a 'completed' list, cross-check against git / Linear / the real machines:

are they accounted for in linear? / did we track all the platform tickets?
Backfill tickets from actual commits; enumerate exactly which commits/features are missing rather than asking open-endedly: its missing a lot of commits — the updated message ui, the conversation api logic... its all missing
is the code on desktop wsl? — confirm committed code landed on the specific machine.
Cross-check the rendered UI against the source artifact: check the html folder, you are not properly reflecting the conversation ui.

Step 8: Report the residual defect in expected-vs-actual terms

Close with what's still wrong, stated as this is my expectation / this is what im getting — never 'works for me'. If it's truly clean, say which real action you ran and what you observed.

Red flags

Anti-pattern	Why it fails Mark
Declaring done after making the change	He'll disprove it by hand — `still failing to fetch`
Trusting a UI confirmation as proof of the side-effect	`got the confirmation but no email sent`
Trusting a green workflow as proof of live deploy	He checks the ECS revision / live URL
Advancing on pending/red CI	He gates strictly on green with a fallback branch
One signal = done	He requires report AND commits AND gate, or DONE AND branch-on-origin
Using mock/placeholder data to fake a result	`no need for mock data lets work with what we have`
Trusting the ticket tracker over git/real machines	He reconciles against the source of record

이 저장소의 다른 Skills

같은 저장소

fleet-dispatch-and-watch

artsmc/claude-dev-agents

Use when Mark wants large or mechanical work fanned out across his machine fleet, or wants long-running/background/remote agents monitored. Triggers on 'distribute among the fleet', 'delegate with tmux and worktrees', 'kick it off fan them out', 'offload to mac-mini, desktop just orchestrate', 'check on the agents every 5 min', 'is it done its been some hours', 'are we ready to delegate', or any dispatch to background workers. This is Mark's dispatch→snapshot→poll→escalate→checkpoint loop — fan independent work to workers (local box orchestrates), hand each a precise state snapshot, then poll on a fixed cadence with trustworthy liveness proxies, error-signature buckets, and hard stop/escalation thresholds — never blocking or assuming success.

2026-05-313

research-gated-build-plan

artsmc/claude-dev-agents

Use at the START of anything sizable, before writing code — when Mark says 'want to work out the full plan', 'I want to do some deep research', 'lets start the planning', 'what does complete look like?', 'what tickets/tasks remain', 'how close are we / what are our current gaps', 'do we already have a harness for this', or 'is there a system we can use?'. This is Mark's front-loaded decision discipline — inventory what already exists before building, scope the gap against a concrete target, research and persist the approach as an artifact, sequence prerequisites, and bake quality bars + phase gates with human checkpoints directly into the go-ahead. It decides WHETHER and HOW to enter the execution skills (start-phase-plan, feature-new); run it first.

2026-05-313

scope-question-and-delegate

artsmc/claude-dev-agents

Use when work is genuinely ambiguous in a way that changes the plan, or big/multi-part/underspecified enough to balloon context — or when the orchestrator's window is trending toward the ~200k cliff where quality and cost degrade. Fires when Mark says 'this is a big one', 'what do you need from me before you start', 'dont drag the whole context through this', 'should we split this up', or 'keep the main thread lean' — or when one thread is doing both the planning AND the execution. This is Mark's triage-then-delegate discipline — decide fast whether real ambiguity or context cost warrants a stop, ask only the few DECISIVE questions, treat the window as a finite budget, and hand each worker a minimal scoped snapshot so the orchestrator plans and synthesizes without accumulating execution detail. Not a hard gate — small clear tasks just proceed.

2026-05-313

diagnose-from-raw-symptom

artsmc/claude-dev-agents

Use when Mark reports a bug by pasting a raw artifact with little or no prose — a stack trace, an HTTP request/response trace (URL + method + status like 500/403/404/502), a console/React/Prisma error, a ChunkLoadError, a shell error, or a screenshot of a visual defect — or when he says things like 'still failing', 'clicking X does nothing', 'no inline editor on the rendered page', 'do we have local postgres/pgvector?', or 'identify the true error'. This is the front-to-back debugging procedure — extract the trigger and reproduction from terse input, probe whether the underlying plumbing even exists, localize by contrasting known-good vs broken, and drive to a durable root cause rather than a workaround. Reach for this BEFORE proposing any fix.

2026-05-303

enumerated-menu-pick-and-sweep

artsmc/claude-dev-agents

Use whenever you're about to present Mark with consequential choices, questions, or tradeoffs — structure them as a numbered or lettered menu so a bare reply counts as a full selection, and use this to read his terse picks. Triggers on one-token replies that only resolve against a labeled list you just gave ('go with B', 'lets go with 2', 'run option B', 'c one big commit'), picks carrying a rider ('option 1 and update envs when your done', '1. yes, and execute it'), multi-item sweeps answering every numbered point at once ('1. yes 2. thats fine 3. just aws 4. whats the mcp server? 5. minimal'), ranged picks with a because-clause scope-cut ('do 1-3, 4 wont be needed because...'), and Mark authoring his own numbered decision list to apply across a spec. Format choices as a menu — a bare letter is uninterpretable unless you offered the list first.

2026-05-303

reference-as-executable-spec

artsmc/claude-dev-agents

Use when Mark frames a feature by pointing at a concrete reference instead of describing behavior from scratch — when he names a live product as the target ('an inline editor like editor.js', 'a code editor like monaco-editor dark theme', 'much like openclaw.ai'), points at an internal subsystem as the literal spec to replicate ('the OTP from sentient-monorepo is the pattern we should match', 'use sentient-monorepo as the api source of truth'), or anchors a requirement on an external library as the structural model ('setups similar to CASL', 'building this same CASL boilerplate'). Whenever he says 'build it like THAT', 'same as X', or 'mimic/replicate X', treat the named reference as the executable spec — go observe it, extract its real behavior, and let THAT define correct rather than inventing requirements.

2026-05-303

출처

artsmc

artsmc/claude-dev-agents

GitHub 저장소 열기 Creator 저장소 보기

설치 명령

다운로드

Manus에서 실행

유용한 대상SOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name

prove-it-live-before-done

description

Prove It Live Before Done

Core rule: a claim is not proof. Exercise the real thing.

Claim being made	What you must actually do to prove it
"The fix works"	Re-run the real user action / construct a deliberate failing case; confirm the exact prior symptom is gone
"Feature is built"	Drive it at its real URL/UI/API as a user would — `start the application in the background` and use it
"Email/notification sent"	Check the actual inbox/provider, not the success toast — `got the confirmation but no email sent: artsmc88@gmail.com`
"Uploaded / queued / saved"	Inspect the queue depth / the DB row — `i uploaded 3 and they are all sitting in the queue`
"Deployed / shipped"	Hit the live URL AND confirm the new ECS taskDef revision / Render dashboard shows the running version by exact version number
"Migration done"	Confirm the migration actually RAN, not just that code changed — `run the migration`
"Tests pass / CI green"	Confirm the gate ran on a machine that actually has node_modules; check coverage hits the bar
"PR is done"	Read the actual diff, not the PR summary — `is the code inside the PR...`; check it hasn't drifted from main

Step 1: Gate every forward step on a green, machine-checkable signal

Never merge, deploy, scale, or advance a phase on red or pending. Always specify the branch for both outcomes — Mark does this reflexively:

merge it if green / merge the green ones
Gate PASS → mark done, write next brief. Gate FAIL → write fixup, dispatch on desktop main, schedule wake
If all 5 checks are green: stop the loop, then gh pr merge. Else resolve and re-run.

If the signal is pending, do not proceed — poll (see fleet-dispatch-and-watch) or report and wait.

Step 2: Require a multi-signal definition of done

One signal is never enough. Mark defines done by the end-to-end outcome:

Report file exists AND commits present AND gate passes on a real machine
Agent says DONE AND its branch is on origin (when sen-259 shows DONE and its branch is on origin)
Code changed AND migration actually run
Feature exists AND it exercises the intended mechanism (the right path, not a stub)
Coverage meets the numeric bar (we need at least 90%)

Step 3: Confirm the deploy is actually LIVE

A release is incomplete until the deployed artifact is confirmed in production — not when the workflow says success.

https://chat.beta.sentienthome.ai/ live? — hit it.
check the v1.5.11 deploy status — by exact version, across iterations.
Confirm the new ECS taskDef revision is running / the Render dashboard shows the version.
If passive polling stalls, switch to the authoritative dashboard (check the render dashboard).

Step 4: Confirm a mutating command actually took effect

Step 5: Confirm the chain actually advanced

Step 6: Audit coverage across the WHOLE surface

Don't accept that a concern exists 'somewhere'. Sweep it:

do we have pino log coverage across the app? — all apps, all containers.
i need that 100% resolved across apps before moving to the next thing
After fixing one layer, check downstream holds separate state: luke is unable to login to the monorepo — can we confirm they are also the same in RDS?
Check adjacent/parallel categories that may have been missed.

Step 7: Reconcile against the system of record

Distrust secondary tracking state. Before accepting a 'completed' list, cross-check against git / Linear / the real machines:

are they accounted for in linear? / did we track all the platform tickets?
Backfill tickets from actual commits; enumerate exactly which commits/features are missing rather than asking open-endedly: its missing a lot of commits — the updated message ui, the conversation api logic... its all missing
is the code on desktop wsl? — confirm committed code landed on the specific machine.
Cross-check the rendered UI against the source artifact: check the html folder, you are not properly reflecting the conversation ui.

Step 8: Report the residual defect in expected-vs-actual terms

Close with what's still wrong, stated as this is my expectation / this is what im getting — never 'works for me'. If it's truly clean, say which real action you ran and what you observed.

Red flags

Anti-pattern	Why it fails Mark
Declaring done after making the change	He'll disprove it by hand — `still failing to fetch`
Trusting a UI confirmation as proof of the side-effect	`got the confirmation but no email sent`
Trusting a green workflow as proof of live deploy	He checks the ECS revision / live URL
Advancing on pending/red CI	He gates strictly on green with a fallback branch
One signal = done	He requires report AND commits AND gate, or DONE AND branch-on-origin
Using mock/placeholder data to fake a result	`no need for mock data lets work with what we have`
Trusting the ticket tracker over git/real machines	He reconciles against the source of record