بنقرة واحدة
ultraqa
Adversarial dynamic e2e QA workflow - generate hostile scenarios, test, verify, fix, report, and clean up
القائمة
Adversarial dynamic e2e QA workflow - generate hostile scenarios, test, verify, fix, report, and clean up
[OMX] Strict autonomous loop: $deep-interview -> $ralplan -> $ultragoal (+ $team if needed) -> $code-review -> $ultraqa
Socratic deep interview with mathematical ambiguity gating before execution
Parallel execution engine for high-throughput task completion
Run a comprehensive code review
Self-referential loop until task completion with architect verification
Alias for $plan --consensus
| name | ultraqa |
| description | Adversarial dynamic e2e QA workflow - generate hostile scenarios, test, verify, fix, report, and clean up |
continue, advance the current verified next step instead of restarting discovery.[ULTRAQA ACTIVATED - ADVERSARIAL DYNAMIC E2E QA CYCLING]
UltraQA finds real behavior failures by combining normal verification commands with generated end-to-end scenarios, hostile user modeling, temporary harnesses when useful, and a structured evidence report. The workflow repeats test → diagnose → fix → retest until the goal is met, a bounded stop condition is reached, or a safety boundary blocks further execution.
Parse the goal from arguments. Supported formats:
| Invocation | Goal Type | What to Check |
|---|---|---|
/ultraqa --tests | tests | Existing tests plus adversarial dynamic e2e scenarios for the changed behavior |
/ultraqa --build | build | Build succeeds and generated smoke/e2e probes still run against the built artifact when applicable |
/ultraqa --lint | lint | Lint passes and no generated harness/test artifact violates project hygiene |
/ultraqa --typecheck | typecheck | Typecheck passes and generated typed harnesses compile when applicable |
/ultraqa --custom "pattern" | custom | Custom success pattern is verified against behavior, not trusted as misleading success output |
/ultraqa --interactive | interactive | CLI/service behavior is tested with generated hostile and edge-case interactions |
If no structured goal is provided, interpret the argument as a custom behavior goal and derive a runnable e2e strategy from repository context.
Before declaring success, create and maintain a scenario matrix. Each row must include: scenario id, intent, user/attacker model, setup, command or harness, expected signal, actual result, fixes applied, evidence, and cleanup status.
The matrix must include normal-path coverage plus adversarial dynamic e2e scenarios selected from the current goal and codebase. Unless clearly irrelevant or impossible, include these hostile and edge-case classes:
continue, stop/cancel/abort wording, interrupted command output, and retries after partial progress..omx/state files, mismatched sessions, missing timestamps, and contradictory phase metadata.Generated harnesses are part of the QA evidence chain; until setup succeeds, they are evidence about the harness apparatus, not product behavior.
/tmp or another scratch directory but imports repository code, resolve the repository root explicitly from the verified repo cwd and import built modules with an absolute path or pathToFileURL(join(repoRoot, "dist", ...)).href. Never rely on ./dist/... from the harness file's temporary directory.${...}, shell metacharacters, or prompt-injection strings. If a shell heredoc is unavoidable, quote the delimiter and verify the written file before execution; do not use interpolating heredocs for JavaScript assertions.OMX_ROOT and OMX_STATE_ROOT unset (for example env -u OMX_ROOT -u OMX_STATE_ROOT ...) so ambient boxed runtime state cannot redirect reads/writes away from the scenario fixture.PLAN ADVERSARIAL QA
RUN BASELINE VERIFICATION
--tests: Run the project's test command.--build: Run the project's build command.--lint: Run the project's lint command.--typecheck: Run the project's type check command.--custom: Run the appropriate command and check the pattern plus exit status and failure markers.--interactive: Use qa-tester or an equivalent CLI/service harness:
Use `/prompts:qa-tester` with:
Goal: [describe what to verify]
Service: [how to start]
Test cases: [normal, hostile, malformed, interruption, resume, stale-state, dirty-worktree, hung-command, flaky, and misleading-output scenarios]
RUN ADVERSARIAL DYNAMIC E2E SCENARIOS
CHECK RESULT
ARCHITECT DIAGNOSIS
Use `/prompts:architect` with:
Goal: [goal type and behavior]
Scenario matrix: [rows, commands, failures, evidence]
Output: [test/build/e2e/harness output]
Provide root cause, safety implications, and specific fix recommendations.
FIX ISSUES
Use `/prompts:executor` with:
Issue: [architect diagnosis]
Files: [affected files]
Constraints: preserve unrelated dirty work, clean temporary harnesses, keep safety bounds
Apply the fix precisely as recommended.
CLEAN UP AND ROLLBACK
REPEAT
UltraQA must stay inside these safety bounds:
| Condition | Action |
|---|---|
| Goal Met | Exit with success: ULTRAQA COMPLETE: Goal met after N cycles plus the structured report |
| Cycle 5 Reached | Exit with diagnosis: ULTRAQA STOPPED: Max cycles plus failures, fixes attempted, residual risks, and evidence |
| Same Failure 3x | Exit early: ULTRAQA STOPPED: Same failure detected 3 times plus root cause, safety notes, and next owner |
| Safety Boundary | Exit: ULTRAQA BLOCKED: [destructive/credentialed/external-production/unbounded action] plus safe substitute evidence |
| Environment Error | Exit: ULTRAQA ERROR: [tmux/port/dependency/hung command issue] plus cleanup status |
Every terminal UltraQA result must include this report shape:
# UltraQA Report
## Goal and success criteria
- Goal:
- Stop condition:
- Safety bounds applied:
## Scenario matrix
| ID | User/attacker model | Scenario | Command/harness | Expected signal | Actual result | Status | Evidence | Cleanup |
|----|---------------------|----------|-----------------|-----------------|---------------|--------|----------|---------|
## Commands run
- `[exit code] command` — purpose, duration/timeout, key output evidence
## Failures found
- Scenario ID, failure signal, root cause, user impact, safety impact
## Fixes applied
- Files changed, rationale, linked failing scenario(s), regression evidence
## Cleanup and rollback
- Generated artifacts removed or intentionally kept
- State/process cleanup performed
- Worktree status before/after
## Residual risks
- Untested or blocked scenarios with reasons and safe substitutes
## Evidence
- Test output, e2e logs, harness output, screenshots/transcripts when relevant, and rerun/flake evidence
Output progress each cycle:
[ULTRAQA Cycle 1/5] Planning adversarial scenario matrix...
[ULTRAQA Cycle 1/5] Running baseline tests...
[ULTRAQA Cycle 1/5] Running ADV-E2E-003 prompt-injection harness...
[ULTRAQA Cycle 1/5] FAILED - stale state resume accepted misleading success output
[ULTRAQA Cycle 1/5] Architect diagnosing scenario ADV-E2E-003...
[ULTRAQA Cycle 1/5] Fixing: src/hooks/... - validate exit code before success phrase
[ULTRAQA Cycle 1/5] Cleaning temporary harnesses and state...
[ULTRAQA Cycle 2/5] PASSED - baseline + 9 adversarial scenarios pass
[ULTRAQA COMPLETE] Goal met after 2 cycles
Use the CLI-first state surface (omx state ... --json) for UltraQA lifecycle state. If explicit MCP compatibility tools are already available, equivalent omx_state calls are optional compatibility, not the default.
omx state write --input '{"mode":"ultraqa","active":true,"current_phase":"planning","iteration":1,"started_at":"<now>","scenario_matrix":[]}' --jsonomx state write --input '{"mode":"ultraqa","current_phase":"qa","iteration":<cycle>,"scenario_matrix":"<updated matrix path or summary>"}' --jsonomx state write --input '{"mode":"ultraqa","current_phase":"adversarial-e2e"}' --jsonomx state write --input '{"mode":"ultraqa","current_phase":"diagnose"}' --json
omx state write --input '{"mode":"ultraqa","current_phase":"fix"}' --jsonomx state write --input '{"mode":"ultraqa","current_phase":"cleanup"}' --jsonomx state write --input '{"mode":"ultraqa","active":false,"current_phase":"complete","completed_at":"<now>"}' --jsonomx state read --input '{"mode":"ultraqa"}' --jsonGood: The user says continue after the workflow already has a clear next step. Continue the current branch of work, rerun the relevant adversarial scenario, and update the report instead of restarting discovery.
Good: The user changes only the output shape or downstream delivery step (for example make a PR). Preserve earlier non-conflicting workflow constraints and apply the update locally.
Good: A CLI prints SUCCESS while exiting 1. Mark the misleading success output scenario failed, fix the parser or reporting path, and rerun the generated harness.
Bad: The workflow runs only npm test, npm run build, npm run lint, or npm run typecheck, sees green output, and declares UltraQA complete without adversarial dynamic e2e coverage.
Bad: A generated harness leaves untracked files, state, or a child process behind and the final report omits cleanup status.
Bad: The user says continue, and the workflow restarts discovery or stops before the missing verification/evidence is gathered.
User can cancel with /cancel, which clears UltraQA state. Cancellation itself should be tested in cancel/resume scenarios when relevant, but UltraQA must not block an explicit user cancellation.
When goal is met OR max cycles reached OR exiting early, run $cancel or call:
omx state clear --input '{"mode":"ultraqa"}' --json
Use CLI state cleanup rather than deleting files directly. Also remove temporary e2e harnesses, fixtures, and logs unless they are intentional artifacts listed in the report.
Begin ULTRAQA cycling now. Parse the goal, build the adversarial dynamic e2e scenario matrix, and start cycle 1.