en un clic
en un clic
Use before starting any coding task when Superpowers is not present.
Use after completing a round of code generation.
Use before starting any coding task when Superpowers is not present.
Use at the start of every session and before completing any coding task.
Use before declaring any coding task complete, done, or finished.
Use at the start of every session and before completing any coding task.
| name | independent-qa |
| description | Use after completing a round of code generation. |
| user-invocable | true |
Dispatch isolated reviewer subagents to verify your code. Harnessed uses independent QA to lower self-evaluation bias and improve issue-finding rate. Self-critique is useful for improving a draft, but it does not replace the independent QA loop.
Determine the acceptance criteria source:
.harnessed/contract.mddocs/superpowers/specs/, normalize it into .harnessed/contract.md, and use that as the shared criteria sourceIf no contract/spec exists: STOP. Invoke harnessed:contract-writing first.
Check the project for available verification infrastructure.
Tier 2 — test suite indicators present (package.json test script, pytest, go test, Playwright/Cypress, or equivalent)
Tier 1.5 — no test suite, but a dev server is running
Tier 1 — code review only
Record the active tier.
Before dispatching reviewers, run available zero-LLM checks:
If these fail, fix them first.
Before gathering context:
.git/MERGE_HEAD)git diff HEAD to capture staged + unstaged changesExtract the changed file paths from this diff. Use them to drive any scope-based policy selection.
Classify the task as standard or high-risk.
A task is high-risk if it touches auth, authz, secrets, crypto, payments, privacy, destructive data operations, infra/prod config, new public endpoints, or if the user explicitly says it is security-sensitive or release-blocking.
Choose one review mode:
Material disagreement means any of the following:
Read docs/evaluator-calibration.md before dispatching reviewers.
This step exists to make evaluator drift explicit rather than silently trusted.
Set Calibration Status to:
For high-risk tasks, stale or missing calibration means the best possible fully automated outcome is SHIP_WITH_HUMAN_REVIEW, not plain SHIP.
If .harnessed/policies/ exists, select only the policy files relevant to the current diff scope.
Use simple path and keyword matching rather than semantic inference:
auth.mdresponse.md + db.md + validation.mdtesting.mdsecurity.mdRead only the matched files. Keep the token budget bounded: inject at most the small set of matching policy summaries, not the full policy directory. If no policy matches, skip policy injection entirely and keep the existing behavior.
Collect the following — this is ALL the reviewers see:
| Include | Why |
|---|---|
| The contract/spec content | Criteria to evaluate against |
git diff HEAD output | The actual code to review |
| Project stack info | Context for evaluation |
| Verification tier | What the reviewer can do |
| Available verification commands | How to verify |
| Risk level | Whether corroboration is required |
| Review mode | Which reviewer path is active |
| Calibration status | Whether clean automation can be trusted fully |
{RELEVANT_POLICIES} | Repo-specific constraints selected from .harnessed/policies/ |
DO NOT include:
The reviewer must judge the code, not your intent.
Construct reviewer prompts from the prompt files in skills/independent-qa/:
evaluator-prompt.md, writes .harnessed/qa-report.md.harnessed/qa-report-secondary.mdsecurity-reviewer-prompt.md, writes .harnessed/qa-report-security.mdtie-break-reviewer-prompt.md, writes .harnessed/qa-report-tiebreak.mdWhen using evaluator-prompt.md, replace:
{CONTRACT}{DIFF}{STACK}{TIER}{MODE}{RISK_LEVEL}{CALIBRATION_STATUS}{VERIFICATION_COMMANDS}{RELEVANT_POLICIES}{GRADING_RUBRIC}If the task is security-sensitive, run available tooling before final synthesis:
Treat tool findings as strong evidence. Treat tool absence as uncertainty, not as proof that the code is secure.
After reviewers return, verify that each expected report:
# QA Reportdispatched_at in .harnessed/qa-state.mdIf a required report is missing or malformed, retry once. If it still fails, treat as BLOCKED.
Use the most conservative defensible outcome.
If final grade is SHIP or SHIP_WITH_HUMAN_REVIEW:
harnessed:verification-gateIf final grade is ITERATE:
If final grade is BLOCKED:
After ITERATE or BLOCKED, update .harnessed/failure-patterns.md from the evaluator's ## Failure Categories table.
Apply the decay and cap rules already defined by Harnessed.
After a SHIP or SHIP_WITH_HUMAN_REVIEW result, append:
## QA Summary
- Tier: {1, 1.5, or 2}
- Risk level: {standard or high-risk}
- Review mode: {standard-review / corroborated-review / security-review / tie-break-review}
- Calibration status: {current / stale / missing}
- Confidence: {High / Medium / Low}
- Uncertainty: {brief list or None}
- Iterations: {count}
- Final grade: {SHIP or SHIP_WITH_HUMAN_REVIEW}
- Key findings fixed: {brief list of issues caught and fixed during iterations, if any}
qa-state.md must track:
iterationdispatched_athead_commitcontract_hashrisk_levelreview_modecalibration_status| Your Thought | Why It's Wrong | What To Do |
|---|---|---|
| "I can include helpful context for the evaluator" | Helpful context biases the reviewer toward your assumptions. | Include only the approved context bundle. |
| "My self-review already found the issue" | Great — fix it. But self-review is still not independent evidence. | Use self-review to improve the draft, then run independent QA. |
| "The evaluator is wrong about this finding" | Maybe, but your bias is exactly why the reviewer exists. | Gather stronger evidence or use the tie-break path. |
| "High-risk mode is too expensive" | High-risk tasks are where unchecked bias is most costly. | Use corroboration and explicit human review. |
| "No tool found anything, so security is proven" | Tool silence is not proof of safety. | Record uncertainty and require human review where appropriate. |
| "The reviewer can infer repo policy from the codebase" | That makes coherence checking depend on model guesswork. | Inject only the relevant policy summaries selected from the diff scope. |
Self-critique can improve a draft. It cannot replace independent verification.
High-risk disagreement must be resolved or escalated — never silently ignored.