mit einem Klick
ab-test-setup
// Design and run A/B tests that produce valid, actionable results. Covers hypothesis design, sample size, metrics selection, and analysis.
// Design and run A/B tests that produce valid, actionable results. Covers hypothesis design, sample size, metrics selection, and analysis.
Comprehensive prompting guide for LTX-2 video generation model. Covers cinematic storytelling, camera language, character direction, dialogue formatting, visual styling, and audio descriptions. Master the art of crafting detailed, story-driven prompts that turn creative visions into stunning AI-generated videos.
Patterns for building LLM-powered agents that can reason, plan, and use tools. Covers agent architectures (ReAct, Plan-and-Execute), tool design, multi-agent systems, memory management, and production deployment considerations.
Decision frameworks for using AI Skills effectively. When to invoke skills, how to compose them, and integration patterns.
Set up analytics tracking that informs decisions. Covers GA4 implementation, event design, UTM strategy, and tracking plans.
REST API design principles and best practices. Use when designing new APIs, reviewing API contracts, or improving existing endpoints. Covers resource modeling, HTTP semantics, error handling, versioning, and documentation.
Software architecture patterns for building maintainable, scalable systems. Covers Clean Architecture, Domain-Driven Design, Hexagonal Architecture, microservices patterns, and CQRS/Event Sourcing. Use when designing new systems or refactoring existing codebases.
| name | ab-test-setup |
| description | Design and run A/B tests that produce valid, actionable results. Covers hypothesis design, sample size, metrics selection, and analysis. |
| license | MIT |
| allowed-tools | Read WebFetch |
| version | 1.0.0 |
| tags | ["ab-test","experimentation","split-test","cro","optimization"] |
| category | marketing/analytics |
| scope | {"triggers":["A/B test","split test","experiment","test this change","multivariate test","hypothesis"]} |
You help design experiments that produce statistically valid, actionable results.
Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll measure [metric].
Example: "Because heatmaps show users miss the CTA, we believe making it larger and higher-contrast will increase clicks by 15%+ for new visitors. We'll measure CTA click-through rate."
| Baseline Rate | 10% Lift | 20% Lift |
|---|---|---|
| 1% | 150k/variant | 39k/variant |
| 3% | 47k/variant | 12k/variant |
| 5% | 27k/variant | 7k/variant |
| 10% | 12k/variant | 3k/variant |
Calculator: evanmiller.org/ab-testing/sample-size.html
Duration: Minimum 1-2 business cycles (usually 1-2 weeks)
Primary: Single metric that matters most, directly tied to hypothesis
Secondary: Support interpretation, explain why/how
Guardrails: Things that shouldn't get worse (revenue, retention)
Example - CTA test:
A/B: Two versions, single change (most common)
A/B/n: Multiple variants (needs more traffic)
Multivariate: Multiple changes in combinations (needs much more traffic)
The peeking problem: Looking at results before sample size and stopping at significance leads to false positives.
| Result | Action |
|---|---|
| Significant winner | Implement |
| Significant loser | Keep control, learn why |
| No difference | Need more traffic or bolder test |
Test: [Name]
Dates: [Start] - [End]
Hypothesis: [Full statement]
Variants:
- Control: [Description]
- Variant: [Description + changes]
Results:
- Sample: [achieved vs target]
- Primary: [control] vs [variant] ([% change], [confidence])
Decision: [Winner/Loser/Inconclusive]
Learnings: [What we learned]
Design: Testing too small a change, testing too many things, no hypothesis
Execution: Stopping early, changing mid-test, not checking implementation
Analysis: Ignoring confidence intervals, cherry-picking segments