en un clic
ab-test-setup
Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Menu
Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Optimize pull requests for quick approval and merging by ensuring clean diffs, comprehensive self-reviews, and structured documentation.
Frontend design entry point: direction, design system, visual philosophy. Use whenever building or touching the look of any web UI (components, pages, dashboards, React/Vue/HTML-CSS) or when the user says "make this look better", "fix the spacing/layout", or mentions styling, color, type, or polish.
Render the UI and prove it's balanced + usable: a deterministic layout audit (centroid / optical-center / pixel-oracle balance via explicit math + annotated screenshot) plus a vision-judged Nielsen usability audit by a separate fresh-eyes judge. The measurement layer taste-only design skills lack.
Automated visual tuning: a vision or video model rates rendered variants in a loop. Render several labeled variants into one artifact, ask the model to rate them and suggest better values, render the suggestions, ask it to pick the best, repeat until good — the model is the eye, you run the loop.
Human-in-the-loop web studio to tune AI-generated output by eye. Stand up a local interactive studio (sliders, pickers, drag handles) or an inline edit/highlight/comment annotation studio for prose & media, instead of guessing values or shipping a static comparison grid.
macOS screen recorder that captures the main display PLUS system audio via ScreenCaptureKit — no BlackHole/loopback driver, no sudo, just the standard Screen Recording permission. CLI-driven; fills the headless-screen-recording-with-system-sound gap QuickTime and `screencapture -v` can't.
| name | ab-test-setup |
| description | Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness. |
| risk | unknown |
| source | community |
| date_added | 2026-02-27 |
Ensure every A/B test is valid, rigorous, and safe before a single line of code is written.
You must have:
A valid hypothesis includes:
Before designing variants or metrics, you MUST:
Ask explicitly:
“Is this the final hypothesis we are committing to for this test?”
Do NOT proceed until confirmed.
Explicitly list assumptions about:
If assumptions are weak or violated:
Choose the simplest valid test:
Default to A/B unless there is a clear reason otherwise.
Define upfront:
Estimate:
Do NOT proceed without a realistic sample size estimate.
Before entering the Execution Readiness Gate below, run through this checklist to make "Tracking is verified" mean something concrete:
If any of the above fails, stop and resolve it before Gate 8.
You may proceed to implementation only if all are true:
If any item is missing, stop and resolve it.
DO:
DO NOT:
When interpreting results:
| Result | Action |
|---|---|
| Significant positive | Consider rollout |
| Significant negative | Reject variant, document learning |
| Inconclusive | Consider more traffic or bolder change |
| Guardrail failure | Do not ship, even if primary wins |
Document:
Store records in a shared, searchable location to avoid repeated failures.
Refuse to proceed if:
Explain why and recommend next steps.
A/B testing is not about proving ideas right. It is about learning the truth with confidence.
If you feel tempted to rush, simplify, or “just try it” — that is the signal to slow down and re-check the design.
This skill is applicable to execute the workflow or actions described in the overview.