تشغيل أي مهارة في Manus بنقرة واحدة

babysit-pr

النجوم٠

التفرعات٠

آخر تحديث٢٤ يونيو ٢٠٢٦ في ٠٣:٢٦

Use when a PR is open and green-but-blocked, or red on CI for reasons that smell like flake — a timed-out test runner, a transient network 500 in a setup step, a check that passed locally but failed in CI. Reach for this whenever someone says "this PR keeps failing CI but the test is flaky", "can you babysit this PR to merge", "it's just a flaky check, retry it", or wants a PR shepherded through retries, conflict resolution, and auto-merge without sitting on it manually. Prefer this over hand-clicking "Re-run failed jobs" in the GitHub UI, which gives up no signal on flaky-vs-real and forgets to enable auto-merge.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

az9713

az9713/skill-best-practices

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

مستكشف الملفات

2 ملفات

SKILL.md

readonly

name

babysit-pr

description

Babysit PR

Overview

A PR that's correct can still sit unmerged for an hour because one CI job flaked, a base-branch advance left it behind, or auto-merge was never turned on. Watching that by hand — refreshing the checks tab, clicking "Re-run failed jobs", remembering to come back — is exactly the kind of low-judgment loop that should be automated, but automated carefully: blindly re-running a failing check until it goes green is how a real bug gets merged.

This skill watches a PR's checks, and when something fails it makes one decision that actually matters: is this flake, or is this real? Flake gets a bounded retry (capped — never an infinite loop). A real failure stops and reports back so a human fixes the code. Merge conflicts get rebased (or merged) against the base. When everything is green and the required gates are satisfied, it enables --auto merge so the PR lands the moment GitHub allows it, and gets out of the way.

The whole point is to be conservative: it would rather hand a PR back to you than merge something it shouldn't.

When to Use

Reach for this when:

A PR is open and you believe it's mergeable but CI is intermittently red — a test runner that OOM'd, a flaky integration test, a transient registry pull failure in a setup step.
A PR fell behind its base branch and needs a rebase/merge to satisfy "branch must be up to date" before it can merge.
You want a PR to land as soon as it's eligible without watching it — enable auto-merge once gates pass and walk away.
You're triaging a queue of your own PRs and want each one nudged through the last mile.

Do NOT use this when:

The PR has failing checks that are clearly real (the diff broke a test, a type error, a lint failure on the changed lines). Babysitting won't fix code — fix the code.
Reviews are still pending or changes are requested. This never overrides a human review gate; it waits for approval, it doesn't bypass it.
The conflict is semantic and needs judgment (two features touching the same logic). Auto-rebase resolves textual conflicts in trivial cases only; anything ambiguous is handed back.
You don't have merge rights on the repo. The script will just spin on a permission error — check gh auth status first.

Running it

cd .claude/skills/babysit-pr

# Babysit a specific PR (number or URL), polling every 30s, up to 2 reruns per check.
./scripts/babysit.sh 4821

# Or let it infer the PR from the current branch.
./scripts/babysit.sh

Useful environment knobs (all optional):

export POLL_INTERVAL_S=30        # how often to poll checks (default 30)
export MAX_RERUNS=2              # per-check rerun cap before giving up (default 2)
export FLAKE_ERROR_RATE_MAX=0.05 # if grafana error rate exceeds this, treat fail as REAL, not flake
export CONFLICT_STRATEGY=rebase  # rebase|merge for bringing the PR up to date (default rebase)
export REQUIRE_APPROVAL=1        # refuse to enable auto-merge while reviews pending (default on)

The script prints a running log of each decision (rerunning check X as flake attempt 1/2; check Y failed for real, stopping; rebased onto base; auto-merge enabled) so the reasoning is auditable, and exits non-zero if it gives up so it composes into a larger automation.

How it decides flaky vs real

This is the load-bearing judgment, so it's deliberate rather than clever:

Read the failed check's conclusion and logs, not just red/green. A job that failed with a known-transient signature — connection reset, i/o timeout, 429 Too Many Requests, an OOM-killed runner, a registry/network pull failure in a setup step before the test body ran — is a flake candidate.
Cross-check the error rate via the grafana skill. Before calling something "just flake", confirm the service isn't actually degraded: if grafana reports the relevant error rate is elevated (above FLAKE_ERROR_RATE_MAX), the "transient" failure may be a real outage the PR is surfacing — in that case treat it as REAL and stop. Flake is a quiet background failure, not a spike everyone is seeing.
A failure inside the test assertions (an expect/assert mismatch, a type error, a lint error on changed lines) is real by default — never rerun it. Reruns are only for infrastructure-shaped failures.
Cap reruns at MAX_RERUNS per check. A check that fails twice after rerun is no longer "flaky" — it's reliably broken. Stop, report, hand back. The cap is what prevents the rerun-until-green antipattern that merges real bugs.

When a flake candidate is identified, the script triggers gh run rerun --failed for just that run and resumes polling.

Gotchas

ALWAYS treat these as real failure modes — each has cost someone a bad merge or a wasted hour.

Rerun-until-green merges bugs. The single most dangerous thing this automation can do is keep re-running a genuinely failing check until variance makes it pass once. That's why assertion/type/lint failures are never rerun, and why infra-flake reruns are hard-capped at MAX_RERUNS. If you find yourself wanting to raise the cap to 5, stop — the check is telling you the truth.
"All checks green" is not "mergeable". A PR can show green checks and still be blocked by a required status check that hasn't reported yet, or by "branch must be up to date with base". Gate auto-merge on the repo's required_status_checks and mergeable state from gh pr view --json mergeable,mergeStateStatus, not on the visible check list. Enabling --auto is the right move here — it asks GitHub to merge when its gates pass, which is the authoritative answer.
Never auto-merge while reviews are pending or changes requested. Green CI is not approval. With REQUIRE_APPROVAL=1 (default) the script refuses to enable auto-merge until the review state is approved; it'll wait, not bypass. Turning this off is only appropriate on repos with no review requirement.
Rebase vs merge on conflict is not interchangeable. Rebasing rewrites the PR's commits onto the new base (clean history, but force-push — fine for a feature branch you own, hostile on a shared branch). Merging the base in keeps history but adds a merge commit. Default to rebase for your own PRs; switch to CONFLICT_STRATEGY=merge when others may have the branch checked out. Either way, only textual trivial conflicts are auto-resolved — if git rebase leaves conflict markers, the script aborts the rebase and hands the PR back rather than committing a half-resolved tree.
A flaky service is not flaky CI. If grafana shows the error rate spiking, the "transient" CI failure is probably the real signal — the PR is catching a live regression. Treat the spike as REAL and stop; don't paper over an incident by retrying past it.
Polling forever costs API quota and hides stalls. The loop has a wall-clock ceiling; a PR that's been pending for an unreasonable time (a stuck queue, a never-scheduled required check) is reported as stalled rather than polled indefinitely. Silence is a failure mode too.

Files

scripts/babysit.sh — the babysitter loop. Resolves the PR (arg or current branch), polls checks via gh pr checks / gh run view, classifies each failure as flake-vs-real (cross-checking the grafana skill for error rates), reruns flakes up to MAX_RERUNS with gh run rerun --failed, rebases/merges the PR up to date on conflict, and enables gh pr merge --auto once required gates and review state are satisfied. Referenced by Running it above.

المزيد من هذا المستودع

نفس المستودع

adversarial-review

az9713/skill-best-practices

Use when a change is written and "looks done" but has not had a hostile second pass before merge — especially diffs touching auth, money, migrations, concurrency, or anything the author is quietly unsure about. Spawns a fresh-eyes reviewer subagent that sees ONLY the diff and the spec, collects findings, drives fixes, and re-dispatches until findings degrade to nitpicks. Reach for this instead of self-reviewing; the author is the worst reviewer of their own diff.

2026-06-240

billing-lib

az9713/skill-best-practices

Use when writing or reviewing code that meters API token usage, bills accounts, issues invoices, applies credit grants, or computes balances with the internal `billing` library — especially around retries, mid-cycle plan changes, cache-read vs cache-write token pricing, or any place where double-billing or rounding drift would be a problem.

2026-06-240

checkout-verifier

az9713/skill-best-practices

Use when an API-credits checkout or paid-plan upgrade needs to be proven end-to-end against Stripe test mode — confirming a card charge actually creates the invoice and subscription in the right state, reproducing a "I paid but my credits didn't show up" report, checking that a declined or 3DS card fails the way the UI claims, or wiring a billing smoke test into CI so a checkout regression is caught before a customer's money is.

2026-06-240

cherry-pick-prod

az9713/skill-best-practices

Use when a specific fix that's already on main needs to land on a production/release branch without dragging along everything else — a hotfix to backport, a "cherry-pick this commit onto release-2.4", a "we need just that one PR on prod" request. Reach for this whenever someone wants to port one or a few commits to a release branch and open a PR for it, especially before doing it by hand in their main checkout, which pollutes their working tree and routinely leaves conflict markers committed or loses the original commit's provenance.

2026-06-240

code-style

az9713/skill-best-practices

Use when writing or editing code in this org's Python or JS/TS, especially before committing or opening a PR — and proactively the moment a diff adds an import, an except/catch, or any logging. Enforces the style rules Claude gets wrong by default: import grouping, error-wrapping (no bare except / empty catch), no leftover debug prints, explicit over clever. Runs scripts/check_style.sh (ruff, mypy --strict, eslint + grep guards) which exits nonzero so it drops into a pre-commit hook or CI.

2026-06-240

cohort-compare

az9713/skill-best-practices

Use when someone wants to compare two cohorts' retention or conversion, asks whether a difference between segments is "real" or "significant", wants retention curves for an A/B group or a launch vs control, or says one cohort "looks better" and needs the delta flagged with a p-value. Reach for this whenever the question is two-group comparison plus significance — and especially before eyeballing two percentages and declaring a winner, which ignores sample size and observation-window mismatch.

2026-06-240

name

babysit-pr

description

Babysit PR

Overview

The whole point is to be conservative: it would rather hand a PR back to you than merge something it shouldn't.

When to Use

Reach for this when:

A PR is open and you believe it's mergeable but CI is intermittently red — a test runner that OOM'd, a flaky integration test, a transient registry pull failure in a setup step.
A PR fell behind its base branch and needs a rebase/merge to satisfy "branch must be up to date" before it can merge.
You want a PR to land as soon as it's eligible without watching it — enable auto-merge once gates pass and walk away.
You're triaging a queue of your own PRs and want each one nudged through the last mile.

Do NOT use this when:

The PR has failing checks that are clearly real (the diff broke a test, a type error, a lint failure on the changed lines). Babysitting won't fix code — fix the code.
Reviews are still pending or changes are requested. This never overrides a human review gate; it waits for approval, it doesn't bypass it.
The conflict is semantic and needs judgment (two features touching the same logic). Auto-rebase resolves textual conflicts in trivial cases only; anything ambiguous is handed back.
You don't have merge rights on the repo. The script will just spin on a permission error — check gh auth status first.

Running it

cd .claude/skills/babysit-pr

# Babysit a specific PR (number or URL), polling every 30s, up to 2 reruns per check.
./scripts/babysit.sh 4821

# Or let it infer the PR from the current branch.
./scripts/babysit.sh

Useful environment knobs (all optional):

export POLL_INTERVAL_S=30        # how often to poll checks (default 30)
export MAX_RERUNS=2              # per-check rerun cap before giving up (default 2)
export FLAKE_ERROR_RATE_MAX=0.05 # if grafana error rate exceeds this, treat fail as REAL, not flake
export CONFLICT_STRATEGY=rebase  # rebase|merge for bringing the PR up to date (default rebase)
export REQUIRE_APPROVAL=1        # refuse to enable auto-merge while reviews pending (default on)

How it decides flaky vs real

This is the load-bearing judgment, so it's deliberate rather than clever:

Read the failed check's conclusion and logs, not just red/green. A job that failed with a known-transient signature — connection reset, i/o timeout, 429 Too Many Requests, an OOM-killed runner, a registry/network pull failure in a setup step before the test body ran — is a flake candidate.
Cross-check the error rate via the grafana skill. Before calling something "just flake", confirm the service isn't actually degraded: if grafana reports the relevant error rate is elevated (above FLAKE_ERROR_RATE_MAX), the "transient" failure may be a real outage the PR is surfacing — in that case treat it as REAL and stop. Flake is a quiet background failure, not a spike everyone is seeing.
A failure inside the test assertions (an expect/assert mismatch, a type error, a lint error on changed lines) is real by default — never rerun it. Reruns are only for infrastructure-shaped failures.
Cap reruns at MAX_RERUNS per check. A check that fails twice after rerun is no longer "flaky" — it's reliably broken. Stop, report, hand back. The cap is what prevents the rerun-until-green antipattern that merges real bugs.

When a flake candidate is identified, the script triggers gh run rerun --failed for just that run and resumes polling.

Gotchas

ALWAYS treat these as real failure modes — each has cost someone a bad merge or a wasted hour.

Rerun-until-green merges bugs. The single most dangerous thing this automation can do is keep re-running a genuinely failing check until variance makes it pass once. That's why assertion/type/lint failures are never rerun, and why infra-flake reruns are hard-capped at MAX_RERUNS. If you find yourself wanting to raise the cap to 5, stop — the check is telling you the truth.
"All checks green" is not "mergeable". A PR can show green checks and still be blocked by a required status check that hasn't reported yet, or by "branch must be up to date with base". Gate auto-merge on the repo's required_status_checks and mergeable state from gh pr view --json mergeable,mergeStateStatus, not on the visible check list. Enabling --auto is the right move here — it asks GitHub to merge when its gates pass, which is the authoritative answer.
Never auto-merge while reviews are pending or changes requested. Green CI is not approval. With REQUIRE_APPROVAL=1 (default) the script refuses to enable auto-merge until the review state is approved; it'll wait, not bypass. Turning this off is only appropriate on repos with no review requirement.
Rebase vs merge on conflict is not interchangeable. Rebasing rewrites the PR's commits onto the new base (clean history, but force-push — fine for a feature branch you own, hostile on a shared branch). Merging the base in keeps history but adds a merge commit. Default to rebase for your own PRs; switch to CONFLICT_STRATEGY=merge when others may have the branch checked out. Either way, only textual trivial conflicts are auto-resolved — if git rebase leaves conflict markers, the script aborts the rebase and hands the PR back rather than committing a half-resolved tree.
A flaky service is not flaky CI. If grafana shows the error rate spiking, the "transient" CI failure is probably the real signal — the PR is catching a live regression. Treat the spike as REAL and stop; don't paper over an incident by retrying past it.
Polling forever costs API quota and hides stalls. The loop has a wall-clock ceiling; a PR that's been pending for an unreasonable time (a stuck queue, a never-scheduled required check) is reported as stalled rather than polled indefinitely. Silence is a failure mode too.

Files

scripts/babysit.sh — the babysitter loop. Resolves the PR (arg or current branch), polls checks via gh pr checks / gh run view, classifies each failure as flake-vs-real (cross-checking the grafana skill for error rates), reruns flakes up to MAX_RERUNS with gh run rerun --failed, rebases/merges the PR up to date on conflict, and enables gh pr merge --auto once required gates and review state are satisfied. Referenced by Running it above.