원클릭으로 Manus에서 모든 스킬 실행

agent-work-reviewer-maintainer

Review Rudder agent work. Use for review/第一性原理/PM review of Codex sessions, PRs, commits, UI, releases, regressions, or agent outcomes. Separates author-claimed proof from reviewer-verified proof. For functional or UI reviews, run the real Rudder scenario with Browser or Computer Use when available instead of accepting from diffs.

Manus에서 실행

개요

설치 명령

npx skills add https://github.com/Undertone0809/rudder --skill agent-work-reviewer-maintainer

이 명령을 Claude Code에 복사하여 붙여넣어 스킬을 설치하세요

출처

Undertone0809/rudder

스타100

포크17

업데이트2026년 6월 3일 08:04

파일 탐색기

2 개 파일

SKILL.md

readonly

이 저장소의 다른 Skills

같은 저장소

rudder

Undertone0809/rudder

Interact with the Rudder control plane through the `rudder` CLI to manage tasks, approvals, comments, Library files, and organization skills during heartbeats. Use for Rudder coordination only, not for the domain work itself.

2026-06-03100

development-lifecycle-router-maintainer

Undertone0809/rudder

Route Rudder development work when a request is ambiguous or spans lifecycle stages: requirements, advisor/product analysis, UI design, implementation, verification, review, commit/push, and handoff. Use for stage selection, reviewer gates, aborted-run recovery, component-lab work, scoped performance optimization, and risky dirty-worktree cleanup. Keep thin: if the prompt clearly names release, UI polish, run/debug, local preview, data path, Desktop recovery, PR preview, mock data, or review-only work, use the narrower maintainer skill directly.

2026-06-03100

para-memory-files

Undertone0809/rudder

File-based memory system using Tiago Forte's PARA method. Use this skill whenever you need to store, retrieve, update, or organize knowledge across sessions. Covers three memory layers: (1) Knowledge graph in PARA folders with atomic YAML facts, (2) Daily notes as raw timeline, (3) Tacit knowledge about user patterns. Also handles shared work files, memory decay, weekly synthesis, and file-based recall. Trigger on any memory operation: saving facts, writing daily notes, creating entities, running weekly synthesis, recalling past context, or managing shared work notes.

2026-06-02100

release-maintainer

Undertone0809/rudder

Maintain and execute Rudder releases across npm, GitHub Releases, and Desktop portable assets. Use this skill whenever the user asks about 发版, release, publishing to npm, canary/stable promotion, GitHub Release assets, Desktop distribution, `npx @rudderhq/cli@latest start`, `npx @rudderhq/cli start`, broken npm `latest` dist-tags, full Desktop install smoke tests, GitHub Release API/rate-limit failures, version bumps, rollback, obsolete canary GitHub Release/tag cleanup after stable promotion, first-time package bootstrap, npm token-based fallback publishing, or release workflow failures. Prefer this skill for both planning and hands-on release operations in the Rudder repository, even when the user only asks "现在要做什么" or "帮我发版".

2026-06-01100

debug-run-transcript-maintainer

Undertone0809/rudder

Debug and analyze Rudder agent run transcripts, run logs, and execution traces. Use this skill whenever the user mentions: - "debug run", "分析 run", "查看 run" - "为什么这个 agent 执行失败", "run 出错了" - "transcript 怎么看", "run 的日志" - a run ID or run ID prefix - "最近 30 个 run", "recent runs", "run 质量", "分析这个 org 最近运行" - agent execution, tool calls, stdout/stderr 调试 Prefer this skill whenever the user wants to understand what happened during one run or a recent batch of runs, even if they only have a partial run ID, org name, agent/runtime, or timeframe. Do not default to raw SQL first; use Rudder's run-intelligence path first, then fall back only if needed.

2026-05-31100

rudder-ui-polish-maintainer

Undertone0809/rudder

Execute screenshot-driven Rudder UI polish and narrow visible interaction fixes. Use when the user points at a Rudder screen or screenshot and asks to adjust density, alignment, badges, labels, cards, menus, hover affordances, empty states, redundant pages, onboarding steps, or other narrow small-to-medium UI behavior, especially with Chinese prompts like "这里有点丑", "对齐一下", "去掉这个页面", "改成这个 UI", "先说说/优化一下", or "弄个截图给我看". Also use for narrow UI fixes even when the user mentions an advisor/reviewer skill, unless they explicitly ask for reviewer agents, repeated review rounds, or an acceptance gate. Prefer build-advisor for pure advice with no implementation request, and advisor-review-loop-maintainer for high-stakes proposal plus reviewer gates.

2026-05-31100

출처

Undertone0809

Undertone0809/rudder

GitHub 저장소 열기 Creator 저장소 보기

설치 명령

다운로드

Manus에서 실행

유용한 대상SOC

소프트웨어 품질 보증 분석가·테스터컴퓨터 및 수학직15-1253L4

name	agent-work-reviewer-maintainer
description	Review Rudder agent work. Use for review/第一性原理/PM review of Codex sessions, PRs, commits, UI, releases, regressions, or agent outcomes. Separates author-claimed proof from reviewer-verified proof. For functional or UI reviews, run the real Rudder scenario with Browser or Computer Use when available instead of accepting from diffs.

Agent Work Reviewer Maintainer

Review completed or in-progress Rudder agent work. This is a reviewer workflow, not an implementation workflow.

The core question is:

Did the agent solve the right product problem, with the right object model, complete behavior, credible validation, and a clean handoff?

Default to Chinese when the user asks in Chinese. Keep the verdict early and ground every judgment in evidence.

Use When

Use this skill when the user asks to review:

a Codex session, Rudder agent run, task, or transcript
a local branch, commit, diff, pull request, or implementation
a product proposal, plan doc, feature spec, or UI direction
a release verification, Desktop install path, npm publish, or canary/stable handoff
a screenshot, browser state, visual interaction, or workflow that "feels wrong"

Common trigger phrases:

"review 一下 codex session id ..."
"as a 专业产品经理 review"
"第一性原理思考一下"
"作为 reviewer"
"这个实现是不是产品上对"
"这个 PR 本地跑起来看一下有没有问题"
"这次 release 验证做完了吗"
"我觉得这个功能之前有，现在没了"

Do Not Use When

Do not use this skill for:

fixing the findings during the same reviewer pass, unless the user explicitly asks to implement fixes
generic line-by-line code review where product judgment is irrelevant
debugging a failed Rudder agent run root cause only; use the run transcript debugging workflow first, then return here for product/workflow judgment
summarizing logs without verdict or acceptance criteria

If the user asks to fix findings after the review, switch to normal implementation mode and follow repository validation, commit, and push rules.

Read-Only Guard

Reviewer mode is read-only by default. Do not edit files, stage changes, restore files, commit, push, start destructive cleanup, or "just fix" findings while reviewing unless the user explicitly changes the task from review to implementation.

If the user says "do not implement", "review only", or assigns a reviewer role, that instruction is binding for the whole reviewer pass. Use tools only to inspect evidence.

Direct UI inspection with Browser, Desktop, or Computer Use is still reviewer work when it only observes or exercises low-risk local/dev flows. If a realistic scenario requires mutating Rudder data, prefer an isolated dev org, disposable test records, or an existing preview instance, and report what was created or changed. Do not delete data, publish, submit external communication, install software, change system settings, or perform other risky UI actions without the appropriate user confirmation.

Evidence Packet

Never start with opinion. Build the smallest evidence packet that can support a real judgment.

1. Identify The Target

Resolve what is being reviewed:

Codex session id or prefix
Rudder run id or transcript
PR number or URL
branch name, commit hash, staged/unstaged diff
plan doc, proposal, screenshot, or browser URL
release version, tag, workflow run, npm dist-tag, or Desktop asset set

If the user is vague, infer from current branch, recent commits, open browser state, or named files before asking.

2. Collect Task Intent

For Codex sessions, search:

rg "<session-id-or-prefix>" ~/.codex/session_index.jsonl ~/.codex/sessions ~/.codex/archived_sessions

Extract real user requests and corrections. Ignore injected AGENTS.md, environment context, skill bodies, and system/developer text.

For branches, PRs, commits, or diffs, inspect:

git status --short --branch
git log --oneline --decorate -12
git diff --stat
git diff
git show --stat <commit>
git show <commit>

For commit or session reviews, also compare the changed-file set against the stated task. Classify every surprising file as one of:

required for the requested behavior
test, docs, or contract evidence for the requested behavior
pre-existing unrelated dirty work
unrelated change mixed into the reviewed commit or handoff

Unrelated changes mixed into a product fix are review findings, not cleanup details. If they change skills, release state, generated files, dependencies, or broad runtime behavior outside the task, usually treat that as at least a conditional accept blocker until the scope is split or justified.

For PRs, read the PR description, changed files, review comments, and CI status when available.

3. Read Product Context

For most Rudder product work, read only the relevant sections of:

doc/GOAL.md
doc/PRODUCT.md
doc/SPEC-implementation.md
doc/DESIGN.md for visible UI and interaction work
the task's plan doc under doc/plans/ when one exists

For release/Desktop/package work, also use:

doc/RELEASING.md
doc/PUBLISHING.md
doc/DESKTOP.md
.github/workflows/release.yml
.github/workflows/desktop-release.yml

For database/API behavior, check the cross-layer contract:

packages/db
packages/shared
server
ui

4. Verify What Was Proven

Separate "implemented" from "proven".

Also separate author-claimed proof from reviewer-verified proof.

author-claimed proof includes validation listed in the prompt, final handoff text, copied terminal output, screenshots the reviewer did not inspect, and test names the implementer says were run.

reviewer-verified proof includes commands, logs, screenshots, browser/Desktop state, API readbacks, git evidence, CI state, or release surfaces that this reviewer actually inspected during the review pass.

Do not convert author-claimed proof into reviewer-verified proof. It can support the review, but it cannot close a final handoff gap for UI, workflow, release, Desktop, runtime, or control-plane behavior when the reviewer could cheaply verify the real surface.

Record which evidence exists:

typecheck, unit tests, build
E2E tests or release smoke tests
browser or Desktop visual verification
screenshots for visible UI
packaged Desktop verification for startup, migrations, profile routing, or installer changes
npm/GitHub Release/live workflow checks for release tasks
commits, pushes, branch state, PR URL, and merge state

Treat timed-out, skipped, or attempted checks as unverified. Do not convert "looked plausible in code" into product proof.

For spawned child reviewers, full-history forks may include the parent agent's prior commands, screenshots, tests, or edits. Treat inherited history and prompt claims as author-claimed proof unless the child reviewer performs or explicitly re-inspects the evidence after the review assignment starts.

When reviewing in-progress work, spawned reviewer child sessions, or a branch with unrelated dirty feature groups, use a mixed-state verdict instead of a binary pass/fail. Examples:

accept: no blocking product, behavior, validation, or handoff gaps remain for the requested scope and verdict level.
conditional accept: the artifact direction is sound, but merge/handoff is blocked by missing proof, unrelated dirty work, or explicit reviewer follow-up.
needs more evidence: the required scenario, diff, source data, or validation is not available enough to judge.
reject: the artifact solves the wrong problem or introduces a blocking regression.

Every verdict must declare its level:

stage verdict: judges whether the current requirements, proposal, design, implementation slice, or review artifact is good enough to proceed to the next stage.
final handoff verdict: judges whether the requested work can be accepted as done, merged, released, or handed to the user with no blocking evidence gap.

Do not let a stage accept read like a final handoff. If terminal product proof, commit/push state, public release evidence, or reviewer follow-up is still missing, the final verdict cannot be accept even when the stage verdict is positive.

For child-reviewer outputs, preserve the parent task boundary. Do not turn the review into implementation and do not judge sibling or unrelated dirty work as part of the artifact unless it affects merge/handoff safety.

4.1 Evidence Freshness Guard

Before writing the verdict, state the evidence baseline being reviewed. This prevents repeated reviewer child sessions from re-judging stale or mismatched artifacts.

Include the relevant subset:

target id: session, run, PR, commit, branch, diff, proposal, release, or screenshot
git basis: current branch and commit SHA, plus whether the worktree is dirty
diff basis: changed files or git diff --stat scope inspected
artifact basis: plan/proposal/screenshot/log path and timestamp when relevant
validation basis: checks, browser/Desktop evidence, CI, release workflow, or product proof actually inspected
proof split: which evidence was author-claimed versus reviewer-verified
review round: first pass, delta review, second round, or final review

If a prior reviewer already judged the same target, same git SHA, and same artifact basis, run a delta review against the changed evidence instead of repeating the full review. If there is no changed evidence, say that the prior verdict still applies and name the missing proof instead of producing a fresh confident verdict.

For spawned reviewer or sub-review work, explicitly say which artifact is being reviewed. Do not silently upgrade the scope from "this proposal" or "this diff" to the whole dirty worktree.

4.2 Run The Real Scenario When It Matters

For functional review, UI review, Desktop review, agent-visible workflow review, or workflow-regression review, prefer direct scenario verification over code-only inference:

Use Browser for local web targets such as localhost, 127.0.0.1, or file previews when the browser can exercise the path.
Use Computer Use for the packaged Rudder Desktop app, native dialogs, update prompts, menus, resident shell behavior, drag/drop, or any flow that is only visible in the local Mac UI.
Use API/log/database checks as supporting evidence, not a replacement for the real operator path when the user's question is about behavior.
For CLI or runtime-agent changes, ask whether the actor that will use the feature actually exercised it. A realistic proof usually includes a disposable org/issue/agent or equivalent fixture, the actor command or wakeup, persisted issue/run/comment readback, and the terminal app or CLI surface.
If the app is not running, start the appropriate local or packaged Rudder path when that is safe and within the review scope. If startup would be expensive or risky, say exactly why the review is limited.
When a real scenario is skipped, the verdict should usually be needs more evidence or conditional accept, and the missing scenario must be named.

For user-visible work, do not accept "tests pass" as enough proof when Computer Use or Browser could cheaply verify the actual Rudder interaction. The minimum credible evidence is the observed workflow state plus any relevant logs, API responses, screenshots, or failure messages.

If the reviewer does not personally inspect the rendered or interactive state for a layout-sensitive UI or functional workflow, the final handoff verdict cannot be accept. Use conditional accept for a sound implementation slice, or needs more evidence when the missing scenario is required to judge the change.

For agent-visible or control-plane work, do not accept direct database assertions, unit tests, or docs updates as the whole proof when a realistic actor-run-chain could cheaply exercise the behavior. Missing terminal product proof should usually make the verdict conditional accept or needs more evidence, even if the diff itself looks correct.

If the user explicitly says the reviewer can use Computer Use or Browser to test a real scenario, treat that as part of the review assignment. If direct scenario verification is skipped or blocked, the verdict should normally be conditional accept or needs more evidence, and the missing interaction must be named.

5. Assemble The Review Packet

Before writing the verdict, make the review packet explicit. It should include the relevant subset:

target: session, run, PR, branch, commit, diff, proposal, screenshot, release, or browser state
evidence baseline: git SHA/branch, dirty state, diff scope, artifact timestamp, and review round when available
user intent: original request plus important corrections or constraints
changed object: the product/workflow/code object being reviewed
evidence inspected: files, diffs, logs, screenshots, docs, plans, tests, CI, browser/Desktop state, release artifacts, or sub-reviewer notes
proof split: author-claimed proof versus reviewer-verified proof
validation status: what passed, what failed, what timed out, what was skipped, and what was only inferred
product proof status: actor, trigger, system effect, terminal surface, and any seed/mutation records created for the review
unresolved evidence gaps: missing screenshots, missing E2E, unchecked downstream consumers, branch/CI uncertainty, or unverified public surfaces

If any packet item needed for a trustworthy judgment is missing, use needs more evidence or conditional accept; do not fill the gap with confidence.

First-Principles Review Frame

Use this frame before writing the verdict.

1. User Job

What real operator or contributor problem was this task supposed to solve? Was the request a symptom of a deeper workflow issue?

Examples:

"Move recent views" may really mean navigation history was modeled as content.
"Where did my draft issue go" is a lifecycle and recovery problem, not just a sidebar rendering bug.
"Calendar blocks are unreadable" is a time-density visualization problem, not a card styling problem.

2. Product Object Model

Identify the object being changed:

view or navigation shortcut
workflow state
draft, issue, goal, project, run, or artifact
external source
preference or setting
release/version/install surface
agent memory, instruction, skill, or operating contract

Judge whether the implementation modeled it as the right kind of object. Many Rudder regressions come from treating a workflow state as a static view, a setting as content, or an external source as an imported local object too early.

3. Core Loop Impact

Ask how the work affects Rudder's north-star loop: real agent work completed end to end.

Good changes reduce operator friction, clarify agent state, preserve control, or make review and handoff easier. Weak changes add surface area without making the agent-work loop more controllable.

4. Scope Discipline

Check whether the work:

preserved organization scoping and permissions
reused existing product concepts instead of inventing new ones
removed half-built surface area when deletion was the right product move
preserved legacy paperclip* compatibility where required
avoided hiding complexity behind vague copy or fake affordances
respected the user's explicit corrections during the session

5. Behavioral Completeness

For user-visible work, inspect the important states:

empty, normal, long, loading, error
direct link, sidebar link, board card, detail page, and modal entry points
cross-organization behavior
mobile or constrained width when relevant
legacy links and previously shipped features

For UI and functional reviews, ask whether the actual rendered or interactive state was seen. Code review alone is not enough for layout-sensitive work, native Desktop behavior, update flows, chat/issue workflows, or any path where the product claim depends on clicks, typing, selection, focus, async state, or cross-page navigation.

6. Trust And Validation

The user is often asking "can I trust this agent work?" Answer that directly.

Look for:

validation mismatch: tests pass but do not cover the operator path
regression risk: a refactor deleted a previous capability
release mismatch: npm, GitHub Release, Desktop assets, tags, and public entry points disagree
branch mismatch: work landed somewhere but not on main
handoff mismatch: URL exists but screenshot or real flow evidence is missing
hygiene mismatch: the implementation artifact is plausible, but unrelated dirty files, child-session duplication, or pending validation means it cannot be safely merged or handed off as complete

Multi-Round Review

When reviewing a proposal, plan, or agent output across multiple rounds, keep a blocker ledger instead of relying on memory or tone.

Use this shape when it is useful:

Blocker ledger:
| blocker | severity | round-one evidence | revised answer | status |
| --- | --- | --- | --- | --- |
| ... | P1 | ... | ... | resolved / unresolved |

Second-round review must judge each prior blocker explicitly. An accept verdict means no unresolved blockers remain for the requested scope; it does not mean the proposal is fully implemented or validated.

For reviewer-loop orchestration, disclose whether the review used real spawned reviewers or a serial two-role fallback when that distinction affects trust.

Lens-Specific Checks

UI/UX And Design

Read doc/DESIGN.md before judging.
Verify rendered states with Browser, screenshot, Desktop shell evidence, or Computer Use against the real packaged app when native behavior matters.
For alignment, row rhythm, avatar/text/time centering, truncation, or column-layout reviews, require production-shaped fixture data and measurable proof when practical. Strong evidence includes real agent avatars, long labels or message text, timestamps/action controls, a screenshot, and DOM bounding boxes or centerline deltas for the elements being aligned.
Treat visual hierarchy, density, interaction feedback, animation, native app affordances, and copy clarity as product quality, not nitpicks.
Check whether menus, hover actions, dialogs, keyboard behavior, and icons match expected Rudder patterns.
If no visual evidence exists, the verdict should usually be needs more evidence or conditional accept.
If the claimed fix is "aligned" but the proof only uses placeholder data, isolated component tests, or screenshots that hide the relevant avatar, timestamp, action, or long-text state, the verdict should usually be conditional accept until the real row shape is verified.

Functional Workflow Review

Reconstruct the user's real scenario as a short workflow, then run it when the local app, preview, or packaged Desktop is available.
Prefer scenario steps such as "open Messenger, send a message, approve the proposal, confirm the issue state" over isolated API calls when the behavior is operator-facing.
Use API, logs, DB rows, and code paths to explain why the observed behavior happened after the UI behavior is known.
If the scenario would mutate shared or production-like data, use a disposable dev record or ask before the risky step. State any mutation in the review packet.
A functional verdict should separate observed user behavior from inferred internal correctness.

Release And Desktop

Confirm the relevant version, git tag, npm dist-tag, GitHub Release assets, Desktop portable assets, and install command.
For Desktop startup, migrations, profile routing, installer assets, or prod-local paths, packaged verification is required before calling it done.
A dry-run does not prove public install. Say exactly which platform and command were actually verified.

Git, PR, Branch, And Worktree

Confirm where the change landed and whether it was pushed.
If the user expected main, verify main contains the commit.
Distinguish the user's unrelated dirty work from the reviewed changes.
Check commit hygiene: the changed files should match the requested product object, tests/docs for that object, and necessary contracts. Flag mixed unrelated edits, accidental skill renames, version bumps, generated artifacts, or broad dependency churn inside narrow fixes.
When reviewing a Codex session, do not rely only on the final summary saying "committed and pushed"; verify the commit or diff when the local repo is available.
For PR preview work, check whether the app was started in an isolated worktree and whether the user received a URL plus screenshots when UI changed.

Agent Skill Or Operating Contract Work

Check trigger description, expected workflow, bundled references/scripts, and eval prompts.
Verify that repo-local development and maintenance skills use the *-maintainer suffix and live under .agents/skills/maintainer/.
Check whether the skill preserves the user's actual repeated corrections rather than only encoding generic best practices.
Prefer eval prompts drawn from real Rudder tasks.

Output Shape

Keep the review compact. Lead with the verdict.

结论：conditional accept。

评分：7/10。

证据基础：
- Session/commit/PR: ...
- Inspect: ...
- Validation: ...
- Gaps: ...

这次任务本质上是在解决：...

做对的地方：
- ...

关键缺口：
1. ...
2. ...

必须补的证据：
- ...

Blocker ledger:
- ...

下一步建议：...

Use accept, conditional accept, reject, or needs more evidence.

Only add line-anchored review findings when they are useful. In Codex app contexts, use ::code-comment{...} for concrete file/line findings and keep the line range tight.

Judgment Rules

A task can be directionally correct and still not be done.
A stage can pass while final handoff is still blocked.
Passing typecheck/build does not prove product behavior.
A visible UI task is not done without rendered-state evidence.
A release task is not done until npm, tags, GitHub Release, Desktop assets, and public install entry points agree for the intended release surface.
Multi-message sessions are not automatic failures; they may be intentional product iteration. Treat repeated corrections as evidence of where the review bar should be raised.
"Implemented" means code or docs changed. "Accepted" means the right behavior was proven for the relevant user path.
Prefer one pragmatic next move over a long wishlist.

Validation Cases

Case: Two-Round Proposal Review

Input: "Review this proposal as Reviewer A. Do not implement." Then later: "Round 2 review. Judge whether this revised proposal resolves your round-one blockers."

Expected behavior: Round one produces a verdict plus blocker ledger. Round two explicitly checks each blocker against the revised proposal and marks it resolved or unresolved before giving the final verdict.

Must not: Edit files, skip the blocker ledger, or say accept without showing why prior blockers were closed.

Case: Functional Review Of A Shipped Workflow

Input: "功能性上 review 一下 reviewer routing，现在是不是产品上对？"

Expected behavior: The review starts from user intent and workflow semantics, then traces every downstream consumer of the relevant object or field, such as attention, filters, wakeups, UI state, and recovery paths. When Rudder is available locally, it also uses Browser or Computer Use to exercise the real operator path or clearly names why live scenario testing was skipped. The verdict separates observed semantic behavior from schema/type correctness.

Must not: Stop after checking schema, route validation, or the obvious happy-path test.

Case: Functional Review With Computer Use

Input: "review 一下最新版 Desktop update 为什么失败，功能上是不是已经好了."

Expected behavior: The review inspects release and code evidence, then uses Computer Use or an equivalent packaged Desktop run to exercise the actual update interaction when safe. It reports the observed app version, update channel, prompt/toast state, and any supporting health/API/log evidence before judging whether the function is proven.

Must not: Call the update flow accepted from release assets or code inspection alone when the packaged Desktop scenario can be tested.

Case: UI Review Without Rendered Evidence

Input: "review 一下这个 UI 改动有没有问题." The diff is available, but no screenshot, browser state, or Desktop state was captured.

Expected behavior: The review can comment on code and likely risks, but the verdict is conditional accept or needs more evidence if layout, dark/light behavior, overflow, hover, dialog, or responsive state matters.

Must not: Call a layout-sensitive UI change fully accepted from code review alone.

Case: Alignment Review With Placeholder Proof

Input: "这里行对齐没有做好. Review the fix." The submitted proof includes a component test with placeholder icons but no real agent avatar, no timestamp, and no browser geometry or screenshot of the production row.

Expected behavior: The review treats the fix as directionally plausible but not fully proven. It asks for production-shaped fixture proof, ideally a browser screenshot plus DOM bounding boxes or centerline deltas for avatar, text, timestamp, and row container.

Must not: Accept the alignment fix as final from placeholder component tests or a cropped screenshot that does not show the elements whose alignment was questioned.

Case: Explicit Review-Only Guard

Input: "Use agent-work-reviewer-maintainer. Review this proposal. Do not implement or edit files."

Expected behavior: The reviewer inspects evidence and returns a verdict, findings, blocker ledger when useful, and next evidence/fix recommendations. It performs no write action.

Must not: Patch files, stage changes, commit, push, or run destructive cleanup.

Case: Author-Claimed UI Proof Is Not Reviewer Proof

Input: "Review this UI workflow. The implementer says Playwright passed and a screenshot was captured, but you have not opened the app or inspected the screenshot yourself."

Expected behavior: The review may use the claimed checks as supporting context, but it labels them as author-claimed proof. If Browser, Computer Use, screenshot inspection, or a current local preview is available, the reviewer either verifies the real UI state or returns conditional accept / needs more evidence for final handoff.

Must not: Give a final accept by repeating the implementer's claimed Playwright, screenshot, or dev-server evidence as if the reviewer personally verified it.

Case: Spawned Reviewer Full-History Contamination

Input: "You are a spawned reviewer. The inherited transcript includes the author's tests, screenshots, and edits before your assignment. Review the current diff."

Expected behavior: The review treats inherited commands and prompt-provided validation as author-claimed proof unless it reruns or re-inspects them after the review task starts. The verdict says exactly which proof was reviewer-verified and which was inherited.

Must not: Count pre-assignment parent commands as reviewer-verified evidence or call a UI workflow fully accepted from inherited proof alone.