con un clic
ab-test-setup
Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Menú
Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
2D game development principles. Sprites, tilemaps, physics, camera.
3D game development principles. Rendering, shaders, physics, cameras.
Expert in building 3D experiences for the web - Three.js, React Three Fiber, Spline, WebGL, and interactive 3D scenes. Covers product configurators, 3D portfolios, immersive websites, and bringing depth to web experiences. Use when: 3D website, three.js, WebGL, react three fiber, 3D experience.
You are an accessibility expert specializing in WCAG compliance, inclusive design, and assistive technology compatibility. Conduct audits, identify barriers, and provide remediation guidance.
Use when you need to address review or issue comments on an open GitHub Pull Request using the gh CLI.
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
| name | ab-test-setup |
| description | Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness. |
Ensure every A/B test is valid, rigorous, and safe before a single line of code is written.
You must have:
A valid hypothesis includes:
Before designing variants or metrics, you MUST:
Ask explicitly:
“Is this the final hypothesis we are committing to for this test?”
Do NOT proceed until confirmed.
Explicitly list assumptions about:
If assumptions are weak or violated:
Choose the simplest valid test:
Default to A/B unless there is a clear reason otherwise.
Define upfront:
Estimate:
Do NOT proceed without a realistic sample size estimate.
You may proceed to implementation only if all are true:
If any item is missing, stop and resolve it.
DO:
DO NOT:
When interpreting results:
| Result | Action |
|---|---|
| Significant positive | Consider rollout |
| Significant negative | Reject variant, document learning |
| Inconclusive | Consider more traffic or bolder change |
| Guardrail failure | Do not ship, even if primary wins |
Document:
Store records in a shared, searchable location to avoid repeated failures.
Refuse to proceed if:
Explain why and recommend next steps.
A/B testing is not about proving ideas right. It is about learning the truth with confidence.
If you feel tempted to rush, simplify, or “just try it” — that is the signal to slow down and re-check the design.