Counterfactuals and Causal Reasoning

A/B Testing Series
- Random Sampling
- Statistical Significance
- Fisher's Exact Test
- Counterfactuals and Causal Reasoning
- Statistical Power
- Confidence Intervals
Introduction
So far in this series we have only considered the possibility that our actions have no effect on an observed outcome. This disheartening possibility is called the Null Hypothesis. Whenever we are using random segmentation to investigate the causal relationship between an experience we are providing and the response of an audience, the observed treatment effect must be large relative to what is plausibly attributable to random chance. To use the standard terminology, the effect must be statistically significant.
When either the audience size, or the causal effect is small, it is unlikely we will achieve statistical significance. Just because a result is not statistically significant, does not mean there is no treatment effect. In order to understand what conclusions we may rightfully draw from such “null” results, we need a better understanding of what sort of outcomes are possible when our actions do indeed affect the behavior of an audience.
Counterfactuals
A popular approach to causal inference is based on counterfactuals. The Stanford Encyclopedia of Philosophy provides an excellent discussion of the history and development of this approach.1 The basic idea is to consider what would have happened if a specific event had not occurred, or a specific agent had not been present. We compare this counterfactual reality with what was actually observed following said event. As discussed in the Random Sampling article, this is easier said than done.
We can make use of the crystal ball we introduced in that post to
provide an example of precisely what we mean by this. Recall from our
subject line example that we have an audience of 1000 email
recipients, and we are investigating the impact of two candidate
subject lines,
Recipient | ||
---|---|---|
Alice | x | x |
Brian | x | |
Charlotte | x | |
David | x | x |
Emily | x | |
Frank | x | |
George | ||
Totals | 3 | 5 |
In the table above, we see that Alice and David opened the email regardless of the subject line they received (an “x” denotes the person opened the email when receiving a particular treatment). George—and the remaining audience members not listed—did not open the email regardless of the subject line they received. None of these recipients’ behaviors were affected by the subject line.
In contrast; Brian’s, Charlotte’s, Emily’s, and Frank’s behaviors were
indeed influenced by the subject line they received. Brian only opened
the email in the reality where he received subject line
Looking at the table, it is clear sending subject line
With random segmentation, we randomly assign subject lines to audience
members and observe the results. This allows us to fill in part of
the table. For example, suppose we randomly select Alice, Charlotte,
David, and Frank (and 496 others) to receive subject line
Receives |
Receives |
||||
---|---|---|---|---|---|
Alice | x | ? | Brian | ? | |
Charlotte | ? | Emily | ? | x | |
David | x | ? | George | ? | |
Frank | ? | (497 others) | ? | ||
(496 others) | ? | ||||
Totals | 2 | ? | ? | 1 |
Consistent with the previous table, Alice and David open the email;
however, it is important to note they do not open it because they
received subject line
Similarly, for the group that receives subject line
What we do know is how people reacted to the subject lines they
received. Out of the 500 people randomly selected to receive subject
line
Random segmentation does not always enable us to determine which treatment gives the best result, as this example shows, but neither does any other method. What random segmentation does provide is:
- A method that gives more reliable answers the larger the audience.
- Extremely precise measures of how reliable the method itself is, for audiences large or small.
I am unaware of any other method that does the same, which is why random segmentation is considered the gold standard of causal inference.
What does “Why?” mean anyway?
In the counterfactual approach, we are interpreting “why” in a specific way. We are fundamentally asking whether the occurrence of a specific event, or the presence of a specific agent was necessary and sufficient for a particular outcome. If not necessary, the outcome would have happened even without the event or agent; if not sufficient, the event or agent is an incomplete explanation.
When we randomly selected Alice to receive subject line
This logic is only applicable in a particular context. If we speculate
about a third subject line,
While counterfactuals and random segmentation form a powerful and
practically useful framework for causal inference, the approach has
limitations. When we ask, “Why did Emily open the email?”, the answer
according to this approach is, “Because she received subject line
A suitably rich collection of subject lines may enable us to investigate these issues. The theory of Experiment Design—and, presumably, theories of marketing and human psychology—have more to say on these issues, which are outside the scope of the present discussion. Nonetheless, in many situations, we are merely attempting to determine the best option from a particular set of alternatives. The counterfactual framework provides a logically compelling approach for considering what it actually means for one option to be best. Random segmentation provides a method not only for determining what that best option is, but also for quantifying the reliability of our conclusions. While there are many important questions that cannot be addressed within this framework, random segmentation is both practical and valuable. 2
-
Menzies, Peter, “Counterfactual Theories of Causation”, The Stanford Encyclopedia of Philosophy (Winter 2017 Edition), Edward N. Zalta (ed.). ↩︎
-
Cover photo courtesy of Burak Kebapci. ↩︎