Ab-Testing | Adventures in Why

Bayesian A/B Testing Considered Harmful

In science we study physically meaningful quantities that have some kind of objective reality, and that means that multiple people should draw substantively equivalent conclusions. But in some situations, this principle is at odds with the Bayesian Coherency Principle, and so we have to choose between internal consistency, or consistency with external reality.

Multiple Comparisons

The simplest kind of A/B test compares two options, using a single KPI to decide which option is best. The more general theory of statistical experiment design easily handles more options and more metrics, provided we know how to incorporate the *multiple comparisons* involved. To see why this is important, read on!

Violations of the Stable Unit Treatment Value Assumption

We have previously mentioned the Stable Unit Treatment Value Assumption, or SUTVA, a complicated-sounding term that is one of the most important assumptions underlying A/B testing (and Causal Inference in general). In this post, we talk a little more about it and why it is so important.

Sprinkle some Maximum Likelihood Estimation on that Contingency Table!

Maximum Likelihood Estimation provides consistent estimators, and can be efficiently computed under many null hypotheses of practical interest.

Contingency Tables Part II: The Binomial Distribution

In our last post, we introduced the potential outcomes framework as the foundational framework for causal inference. In the potential outcomes framework, each unit (e.g. each person) is represented by a pair of outcomes, corresponding to the result of the experience provided to them (treatment or control, A or B, etc.)

Contingency Tables Part I: The Potential Outcomes Framework

"Why can't I take the results of an A/B test at face value? Who are you, the statistics mafia? I don't need a PhD in statistics to know that one number is greater than another." If this sounds familiar, it is helpful to remember that we do an A/B test to learn about different potential outcomes. Comparing potential outcomes is essential for smart decision making, and this framework is the cornerstone of causal inference.

Unshackle Yourself from Statistical Significance

Don't be a prisoner to statistical significance. A/B testing should serve the business, not the other way around!

A/B Testing

Calculators for planning and analyzing A/B tests

A/B Testing Best Practices

When I started this blog, my primary objective was less about teaching others A/B testing and more about clarifying my own thoughts on A/B testing. I had been running A/B tests for about a year, and I was starting to feel uncomfortable with some of the standard methodologies. It’s pretty common to use Student’s t-test to analyze A/B tests for example. One of the assumptions underlying that test is that the distributions are Gaussian. “What about A/B testing is Gaussian?”, I wondered. I knew there was a big difference between one-sided and two-sided tests, but I didn’t feel confident in my ability to choose the right one. And the multiple comparisons problem seemed to rear its ugly head at every turn: what was the best way to handle this?

Confidence Intervals

Statistical analysis is not complete without an estimate of residual uncertainty.