Question 1

Frequentist vs Bayesian A/B testing — which should I use?

Accepted Answer

Frequentist (p-values, fixed sample size) is what most tools/companies use; works well for high-traffic web tests with clear pre-registered hypotheses. Bayesian (probability of B beating A) is more intuitive, lets you peek without inflating false positives, and handles low-traffic scenarios better. Modern platforms (Statsig, Eppo, GrowthBook) default to Bayesian or sequential testing. For a new program: pick the framework your team understands, not the 'correct' one. Both work if used right.

Question 2

How do I avoid the peeking problem?

Accepted Answer

Peeking = checking a frequentist test before reaching pre-calculated sample size. Inflates false positive rate from 5% to 30%+. Solutions: (1) calculate sample size in advance, only check at the end; (2) use sequential testing (always-valid p-values, e.g. mSPRT) which is peek-safe; (3) use Bayesian inference, which is naturally peek-tolerant. Modern platforms handle this automatically — old platforms (Google Optimize, simple t-tests) require discipline.

Question 3

How do I prioritize which tests to run?

Accepted Answer

ICE (Impact × Confidence × Ease, 1-10 each) for fast triage. PIE (Potential × Importance × Ease) for marketing. RICE (Reach × Impact × Confidence ÷ Effort) for product. The framework matters less than: (1) writing down your reasoning before running; (2) reviewing predictions vs results quarterly; (3) killing low-traffic tests that can't reach significance. Top experimentation programs maintain a backlog of 50-100 ideas, run 20-50/quarter, ship the wins.

Question 4

Why do most A/B tests come back inconclusive?

Accepted Answer

Two reasons: (1) effect sizes in mature products are tiny — 0.5-2% lifts are typical, requiring tens of thousands of users to detect. (2) tests are often underpowered: sample size calculated for a 10% lift, actual effect is 1%, you need 100x more users. Fix: calculate Minimum Detectable Effect (MDE) before launch, accept that 70%+ of tests will be flat, and treat 'inconclusive' as useful data ('don't waste eng cycles on this idea').

Question 5

Tool stack: Optimizely vs GrowthBook vs Statsig vs LaunchDarkly?

Accepted Answer

Optimizely: enterprise, expensive ($30k+/yr), strong WYSIWYG editor, great for marketing teams. GrowthBook: open-source, self-host, dev-friendly, free tier. Statsig: free tier + great Bayesian stats, becoming the modern default. LaunchDarkly: feature flags first, experimentation second; pick if you already have it. For startups: Statsig free tier or GrowthBook. For enterprise: Optimizely or self-hosted GrowthBook. For Posthog stacks: PostHog Experiments built-in.

Question 6

What's a 'guardrail metric' and why do I need them?

Accepted Answer

Guardrail = secondary metric that monitors for unintended harm even when primary metric wins. Example: testing a paywall change — primary = revenue (up); guardrail = retention 90d (must not drop). Without guardrails, you ship 'wins' that destroy long-term LTV. Standard guardrails: retention, NPS, support tickets, page load time, error rate. Mature programs auto-flag tests where any guardrail moves > 1% even if primary wins.

Question 7

How big is the salary jump from running tests to running a program?

Accepted Answer

Practitioner ($80-110k) — runs individual tests, picks variants, calculates significance. Strategist ($110-150k) — designs prioritization framework, builds test backlog, mentors. Lead ($150-200k+) — owns the experimentation platform, sets test velocity targets, runs experimentation guild. The 2-3x salary lift comes from moving from tactical (this test) to systemic (org runs 50 tests/quarter, here's the playbook).

Region	Junior	Mid	Senior
USA	$95k	$135k	$185k
UK	£55k	£80k	£115k
EU	€60k	€85k	€120k
CANADA	C$100k	C$140k	C$190k

A/B Testing Strategy

What is A/B Testing Strategy

📋 Before you start

💰 Salary by region

🎓 Certifications

🎯 Careers using A/B Testing Strategy

⚖ Compare with

❓ FAQ

Not sure this skill is for you?

Find your ideal career path