Posts

Bayesian A/B Testing Considered Harmful

In science we study physically meaningful quantities that have some kind of objective reality, and that means that multiple people should draw substantively equivalent conclusions. But in some situations, this principle is at odds with the Bayesian Coherency Principle, and so we have to choose between internal consistency, or consistency with external reality.

Edgeworth Series in Python

We often use distributions that can be reasonably approximated as Gaussian, typically due to the Central Limit Theorem. When the sample size is large (and the tails of the distribution are reasonable), the approximation is really good and there’s no point worrying about it. But with modest sample sizes, or if the underlying distribution is heavily skewed, the approximation may not be good enough.

Testing with Many Variants

This is a long drive for someone with nothing to think about.

Robust Power Assessment

An important part of planning any statistical experiment is power analysis. In this post I will focus on power analysis for linear regression models, but I am hopeful much of this can be applied to Generalized Linear Models and hence to the sorts of A/B tests I normally run.

Scheffe's Method for Multiple Comparisons

I’ve written previously about using the Bonferroni correction for the multiple comparisons problem. While it is without a doubt the simplest way to correct for multiple comparisons, it is not the only way. In this post, I discuss Scheffé’s method for constructing simultaneous confidence intervals on arbitrarily many functions of the model parameters.

Supervised Learning as Function Approximation

Supervised learning is perhaps the most central idea in Machine Learning. It is equally central to statistics where it is known as regression. Statistics formulates the problem in terms of identifying the distribution from which observations are drawn; Machine Learning in terms of finding a model that fits the data well.

Naive Comparisons Under Endogeneity

Recently I have been reading Causal Inference: The Mixtape by Scott Cunningham. One thing I think Cunningham explains very well is the role of endogeneity in confounding even simple comparisons. I don’t have a background in economics, so I had never really grokked the concepts of endogenous and exogenous factors, especially as it related to causal inference. In this post, I’m going to discuss a few examples that highlight why it’s such an important distinction.

Advice for Early Career Data Scientists

Coming out of college, I had some ideas about how I was going to become successful and what my career was going to look like. Of course, I was all wrong. Here is the advice I would offer a young me.

Multiple Comparisons

The simplest kind of A/B test compares two options, using a single KPI to decide which option is best. The more general theory of statistical experiment design easily handles more options and more metrics, provided we know how to incorporate the multiple comparisons involved. To see why this is important, read on!

Violations of the Stable Unit Treatment Value Assumption

We have previously mentioned the Stable Unit Treatment Value Assumption, or SUTVA, a complicated-sounding term that is one of the most important assumptions underlying A/B testing (and Causal Inference in general). In this post, we talk a little more about it and why it is so important.