تشغيل أي مهارة في Manus بنقرة واحدة

verify

النجوم٢

التفرعات٥

آخر تحديث١٥ يونيو ٢٠٢٦ في ١٧:٥٥

Judgement-based QA pass. Does this artifact meet its goal and serve its user? Demands excellence, not compliance. Owned by marsha; reads the spec's Fitness Rubric (designed upstream via /design-rubric).

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

nicsuzor

nicsuzor/academicOps

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات·SOC 15-1253

مستكشف الملفات

10 ملفات

SKILL.md

readonly

name	verify
type	skill
category	instruction
description	Judgement-based QA pass. Does this artifact meet its goal and serve its user? Demands excellence, not compliance. Owned by marsha; reads the spec's Fitness Rubric (designed upstream via /design-rubric).
triggers	["verify","QA check","acceptance test","quality check","is it done","validate work"]
modifies_files	true
needs_task	true
mode	execution
domain	["quality-assurance"]
allowed-tools	Task,Read,Glob,Grep
version	2.0.0
permalink	skills-verify

Judgement-Based Verification Guidelines

Conduct rigorous QA reviews of artifacts to ensure correctness, complete implementation, and fitness for purpose.

Step 0 — Premise Test (forced; runs BEFORE you read the diff)

Before you read a single line of the diff, judge the premise from the task + diffstat alone and write the sharp principal's one-sentence snap reaction — "was this a good idea, in this shape?" — verbatim, as forcing-check item 0. You cannot emit a PASS verdict without it; a bad premise is a FAIL regardless of test coverage (green tests are the expected surface of a bad premise, not a mitigant). Diffstat-first ordering is mandatory — reading the code first is exactly what lets a clean, well-tested surface launder a bad premise.

Full definition, the verbatim prompt, the never-a-checklist hard rule, and the worked specimen live in the canonical reference: [[premise-test.md]]. (FAIL is the local rejection token here; the arch-fit lens emits 🔴 REJECT for the same call.)

Core Directives

Default posture: assume it's broken. The burden is on the artifact to prove it works — not on you to prove it doesn't.

Verify Evidence: Read files, run code, and inspect actual outputs directly. Do not rely on agent summaries. Cite exact file paths, line numbers, or logs.
Classify the Bar:
- Mechanical Bar: Verify against Acceptance Criteria (AC). Verdict: PASS, FAIL, or REVISE.
- Fitness / Mixed Bar: Verify against the AC and the spec's ## Fitness Rubric. (If missing on a fitness task, return REVISE — fitness rubric missing).
Completeness check: Apply the completeness heuristic before signing off:
- Check freshness of inputs read.
- Verify changes are complete across all callsites.
- Acknowledge known limitations or constraints.
Project-rule check: If .agents/rules/RULES.md exists in this repo, read it before judging. Apply its rules with the same class/instance discipline as AXIOMS.md. Project-rule violations belong under Process Compliance in the report, cited by {#slug}.
Forcing Checks: Write explicit answers for each in the report before a PASS verdict:
- Premise Test (step 0, before reading the diff): State verbatim the sharp-principal reaction from task + diffstat alone (see Step 0). A bad premise is a FAIL regardless of test coverage; you cannot reach PASS without writing it.
- Sentinel / Empty-State Audit: Count and list empty/sentinel fields (e.g. DERIVER_MISSING, N/A, TODO). Fail if primary value-signals are missing.
- Principal's-Eye Top-Line Read: State verbatim the most prominent headline element and verify correctness for the end-user. For "show me my X" surfaces, this means reproducing the principal's literal view (his account, host, launch-context) and confirming HIS OWN instance is present — a generic instance is FAIL (see /design-rubric self-instance requirement).
- Floor vs Ceiling: State verbatim: "exceptional, or merely working?". Merely working is not a PASS on fitness tasks.
No Anchoring/Bias:
- If you participated in designing or iterating on this artifact, you are disqualified from reviewing it for fitness.
- Dispatches must be neutral (do not pre-state expected verdicts).

Data Pipeline Verification

For any artifact with computed, aggregated, or derived output (dashboards, reports, metrics), trace source → output: confirm the source is real, populated, and fresh; independently cross-verify the values against that source; disable any fallback to prove the primary path works alone (a fallback silently masks a broken primary); and check behaviour under load. The question is not "did output appear?" but "is this the RIGHT data?" — plausible-looking output is the most dangerous kind of incorrect output.

HALT Triggers (Immediate FAIL)

Stop evaluation immediately and write a FAIL verdict if any of the following occur:

Bad premise — a sharp principal would not have built this, or not in this shape (step-0 Premise Test failed; full definition [[premise-test.md]]). FAIL regardless of green tests; test-passing is the expected surface of this failure, not a mitigant.
Primary fields rendering as sentinels/placeholders.
Headline element is wrong for the end user.
Repeated or empty section headers.
Placeholder text ({variable}, TODO, FIXME) in production.
Overlapping/clipped text in rendered visual output.
Suspiciously short output for complex operations.
Silent error swallowing (try/except without logging).
Test suite checking existence instead of content.
Data that looks plausible but does not match its source.

Verdict Format

Output reports exactly in this format:

## Verification Report

**Bar:** [mechanical / fitness / mixed]
**Verdict:** [PASS / FAIL / REVISE]

### Concrete observations

[Observed bugs/defects, file paths, line numbers, and log excerpts]

### Forcing checks

0. **Premise test (before reading the diff):** [verbatim sharp-principal reaction from task + diffstat alone — "was this a good idea, in this shape?" A bad premise -> FAIL regardless of tests; cannot reach PASS without this line]
1. **Sentinel/empty-state audit:** [count + list of sentinels/placeholders. If primary signals absent -> FAIL]
2. **Principal's-eye top-line read:** [headline element quoted, and whether correct]
3. **Floor vs ceiling:** [verbatim "exceptional, or merely working?"]

### Process compliance

[Project-rule violations cited by `{#slug}` from `.agents/rules/RULES.md` if present, or "RULES.md absent — skipped"]

### Judgement

[Prose evaluation against AC, Red Flags, and/or Fitness Rubric dimensions]

### Recommendation

[If FAIL/REVISE: specific remediation steps and user impact]

Browser-Driven UI Verification

For web applications:

Navigate to the URL and wait for page-ready.
Capture screenshots at 1920×1080 resolution.
Save screenshots to $AOPS_SESSIONS/qa-screenshots/YYYY-MM-DD/.
Apply visual analysis checks for layout and legibility defects.

المزيد من هذا المستودع

نفس المستودع

daily

nicsuzor/academicOps

Daily note lifecycle — compose and maintain a factual daily note. Reports the state of the day; does not prioritise or recommend. SSoT for daily note structure.

2026-06-242

remember

nicsuzor/academicOps

Unified memory skill: immediate mode (/remember) persists knowledge via PKB MCP; maintenance mode (/sleep, GHA cron) runs periodic consolidation — transcript mining, knowledge synthesis, data quality, brain sync.

2026-06-242

strategic-review

nicsuzor/academicOps

Unified multi-agent review of any artifact — a document, plan, proposal, or pull request. The calling agent deploys rbg, pauli, and marsha in parallel, then @james reconciles their findings into one verdict. Pass `comment` and/or `fix` to write the result back to the review surface. Use `--critic` for a fast pauli-only pre-hoc critique.

2026-06-242

craft

nicsuzor/academicOps

Instruction quality gate — reviews agent instructions (task bodies, workflow steps, skill procedures, self-test protocols) for shallow-execution vulnerabilities before deployment. Two modes: author (pre-hoc review) and audit (trace a failure back to the instruction gap). The bar is excellence, not compliance.

2026-06-242

planner

nicsuzor/academicOps

Strategic planning agent — graph structure ownership, task decomposition, knowledge-building, and PKM maintenance. Works on WHAT exists and HOW it relates.

2026-06-242

survey

nicsuzor/academicOps

Survey a corpus, classify, and dispatch outputs. Three modes: retro (transcript review → issues), trend (longitudinal performance analysis), sweep (GitHub issue triage → fix-epics). Delegates execution to pauli (retro/trend) or jr (sweep) to keep main context clean.

2026-06-242

Judgement-Based Verification Guidelines

Conduct rigorous QA reviews of artifacts to ensure correctness, complete implementation, and fitness for purpose.

Step 0 — Premise Test (forced; runs BEFORE you read the diff)

Core Directives

Default posture: assume it's broken. The burden is on the artifact to prove it works — not on you to prove it doesn't.

Verify Evidence: Read files, run code, and inspect actual outputs directly. Do not rely on agent summaries. Cite exact file paths, line numbers, or logs.

Classify the Bar:

Mechanical Bar: Verify against Acceptance Criteria (AC). Verdict: PASS, FAIL, or REVISE.
Fitness / Mixed Bar: Verify against the AC and the spec's ## Fitness Rubric. (If missing on a fitness task, return REVISE — fitness rubric missing).

Completeness check: Apply the completeness heuristic before signing off:

Check freshness of inputs read.
Verify changes are complete across all callsites.
Acknowledge known limitations or constraints.

Project-rule check: If .agents/rules/RULES.md exists in this repo, read it before judging. Apply its rules with the same class/instance discipline as AXIOMS.md. Project-rule violations belong under Process Compliance in the report, cited by {#slug}.

Forcing Checks: Write explicit answers for each in the report before a PASS verdict:

Premise Test (step 0, before reading the diff): State verbatim the sharp-principal reaction from task + diffstat alone (see Step 0). A bad premise is a FAIL regardless of test coverage; you cannot reach PASS without writing it.
Sentinel / Empty-State Audit: Count and list empty/sentinel fields (e.g. DERIVER_MISSING, N/A, TODO). Fail if primary value-signals are missing.
Principal's-Eye Top-Line Read: State verbatim the most prominent headline element and verify correctness for the end-user. For "show me my X" surfaces, this means reproducing the principal's literal view (his account, host, launch-context) and confirming HIS OWN instance is present — a generic instance is FAIL (see /design-rubric self-instance requirement).
Floor vs Ceiling: State verbatim: "exceptional, or merely working?". Merely working is not a PASS on fitness tasks.

No Anchoring/Bias:

If you participated in designing or iterating on this artifact, you are disqualified from reviewing it for fitness.
Dispatches must be neutral (do not pre-state expected verdicts).

Data Pipeline Verification

HALT Triggers (Immediate FAIL)

Stop evaluation immediately and write a FAIL verdict if any of the following occur:

Bad premise — a sharp principal would not have built this, or not in this shape (step-0 Premise Test failed; full definition [[premise-test.md]]). FAIL regardless of green tests; test-passing is the expected surface of this failure, not a mitigant.

Primary fields rendering as sentinels/placeholders.

Headline element is wrong for the end user.

Repeated or empty section headers.

Placeholder text ({variable}, TODO, FIXME) in production.

Overlapping/clipped text in rendered visual output.

Suspiciously short output for complex operations.

Silent error swallowing (try/except without logging).

Test suite checking existence instead of content.

Data that looks plausible but does not match its source.

Verdict Format

Output reports exactly in this format:

## Verification Report **Bar:** [mechanical / fitness / mixed] **Verdict:** [PASS / FAIL / REVISE] ### Concrete observations [Observed bugs/defects, file paths, line numbers, and log excerpts] ### Forcing checks 0. **Premise test (before reading the diff):** [verbatim sharp-principal reaction from task + diffstat alone — "was this a good idea, in this shape?" A bad premise -> FAIL regardless of tests; cannot reach PASS without this line] 1. **Sentinel/empty-state audit:** [count + list of sentinels/placeholders. If primary signals absent -> FAIL] 2. **Principal's-eye top-line read:** [headline element quoted, and whether correct] 3. **Floor vs ceiling:** [verbatim "exceptional, or merely working?"] ### Process compliance [Project-rule violations cited by `{#slug}` from `.agents/rules/RULES.md` if present, or "RULES.md absent — skipped"] ### Judgement [Prose evaluation against AC, Red Flags, and/or Fitness Rubric dimensions] ### Recommendation [If FAIL/REVISE: specific remediation steps and user impact]

Browser-Driven UI Verification

For web applications:

Navigate to the URL and wait for page-ready.

Capture screenshots at 1920×1080 resolution.

Save screenshots to $AOPS_SESSIONS/qa-screenshots/YYYY-MM-DD/.

Apply visual analysis checks for layout and legibility defects.