Robust Portfolio Optimization in Generalized Linear Models
Table of Contents
Introduction
Often (always?) we run an A/B test in order to inform some decision. For example, perhaps we are deciding between two or more email subject lines, so we do an A/B test to figure out which one is best. Often we will declare one subject line “the winner” and use that going forward, at least until we think of a new subject line.
Rarely are we completely sure the observed-best subject line is indeed the best. Every A/B test involves uncertainty, and even when getting a stat sig result, that only means the probability of a false positive is small (e.g. 5%), not zero. And especially when testing more than a few variants, it can be challenging to discern between the observed-best subject line and the runner(s)-up. More commonly, the conclusion of an A/B test is: “I think this one is the best, but these others also seem pretty good, and I’m not 100% positive which one is best”.
Here’s a wacky idea: discard the clear under-performers, but just use all the best options! That is, instead of using the observed-best subject line for 100% of your email volume, use it for, say 75% of your email volume, with the remaining volume split across any other promising contenders.
Of course, this isn’t always practical. Juggling multiple subject lines involves increased engineering complexity and it might not be worth it, but the benefit of this approach is increased robustness against the uncertainty associated with the test. In fact, consideration of robustness provides guidance on exactly what fraction of traffic to devote to which subject line.
Before diving into the math, I want to mention that this approach mirrors the Nobel-prize winning portfolio optimization theory developed by Harry Markowitz in 1952. This approach considers different assets, such as stocks or email subject lines, that have an estimated return on investment and an uncertainty associated with those estimates. If we knew which asset had the best ROI, we would invest all of our resources with that asset. Analogously, if we knew which email subject line was best, we would use it 100% of the time. But uncertainty means we don’t know which option is best; that uncertainty is a type of risk associated with our investment strategy. Markowitz’s core idea is to allocate our resources among the different assets to trade off between two objectives: maximize expected return and minimize risk. This approach leads naturally to diversification. Rather than investing all of our resources on whatever asset we think is the best, we split our resources across multiple assets, often dramatically reducing our risk while maintaining good ROI.
Formulation
Markowitz’s approach modeled returns as having Gaussian distributions, but it’s much more common in A/B testing to work with binomial distributions. In this post, I discuss extending Markowitz’s portfolio theory to generalized linear models (which include the binomial and Gaussian distributions as special cases).
The basic problem we solve is:
$$ \begin{array}{cl} \mbox{maximize${}_{c}$ minimize${}_{\beta}$} & c^T \beta \\\ \mbox{subject to} & c \succeq 0 \\\ & \mathbf{1}^T c = C \\\ & \beta \in S, \end{array} $$ where $c_i$ is the resource amount devoted to asset $i$, $C$ is our total budget, $\beta_i$ is the (unknown) return of asset $i$, and $S$ is a set that contains the true asset returns with high probability.
This is a minimax problem and provided $S$ is convex, can be solved efficiently. Even with thousands of asset types we can find the optimal allocation in less than a second on modern laptops; with at most a few dozen asset types (can you imagine A/B testing thousands of subject lines?) we can find the optimal allocation in a few milliseconds. This approach is thus suitable for online use, such as a component in a larger system.
Let’s discuss the meaning of this formulation. Suppose we have already decided, somehow, how to allocate our resources. So consider $c$ to be fixed. Our portfolio performance is $c^T \beta$. For example, if $c_i$ is the number of emails sent using subject line $i$, and $\beta_i$ is the send-to-purchase conversion rate associated with subject line $i$, then the total number of purchases is $\sum_{i} c_i \beta_i = c^T \beta$.
But we don’t really know what $\beta$ is. At best, we have some point estimate of it, but that point estimate involves uncertainty, which can be expressed in the form of a confidence region. In my article on Scheffé’s method, I gave some examples, all of which were convex and thus suitable for this approach. Given such a region, we can ask: what is the worst performance we might observe for a given allocation $c$? That motivates the problem:
$$ \begin{array}{cl} \mbox{minimize} & c^T \beta \\\ \mbox{subject to} & \beta \in S. \end{array} $$
Let $p(c)$ be the value of this problem, with the dependence on the allocation $c$ clearly indicated. The actual performance of the strategy corresponding to $c$ may be much better than $p(c)$, but it is unlikely to be worse. Thus, $p(c)$ is a conservative estimate of the performance.
Next we ask: how can we allocate resources to get the best possible worst-case performance? This motivates:
$$ \begin{array}{cl} \mbox{maximize} & p(c) \\\ \mbox{subject to} & c \succeq 0 \\\ & \mathbf{1}^T c = C. \end{array} $$
The constraints say that we can’t devote negative resources to any asset class, and the total resources must equal our budget, $C$. Perhaps surprisingly, $p(c)$ is a concave function, so this is a convex optimization problem (meaning, we are maximizing a concave function subject to linear equality and inequality constraints). Combining the two problems leads to the minimax formulation above.
A Robustness Budget
We may decide that optimizing for the worst case performance neglects actual performance too much. We may wish instead to decide on some “robustness budget” we are willing to pay for better worst case performance. If $\hat{\beta}$ is a point estimate of $\beta$ and $d = \max_i \{\hat{\beta}_i\}$, then the optimal performance (ignoring the uncertainty in $\beta$) is simply $C d$. This corresponds to devoting all of our resources to whichever asset we believe to be the best, robustness be darned. Then we can continue to optimize for the worst case performance, but with a constraint on the estimated performance: $$ \begin{array}{cl} \mbox{maximize${}_{c}$ minimize${}_{\beta}$} & c^T \beta \\\ \mbox{subject to} & c \succeq 0 \\\ & \mathbf{1}^T c = C \\\ & \beta \in S \\\ & c^T \hat{\beta} \geq (1 - \theta) \cdot C d, \end{array} $$ where $\theta$ dictates how much expected performance we are willing to give up to improve worst case performance ($c^T \hat{\beta}$ is the return on investment if our point estimate of $\beta$ turns out to be correct). For example, setting $\theta = 0.05$ means we are willing to give up 5% of our expected performance in order to improve robustness against the uncertainty in $\beta$.
Of course, setting $\theta$ too low means we are not getting any robustness benefits, but setting $\theta$ too high means our actual performance might be poor, especially if our point estimate of $\beta$ is pretty good!
We can estimate both the expected ($c_{\theta}^{\star T} \hat{\beta}$) and worst-case performance, $p(c_{\theta}^\star)$, for a sequence of tradeoffs $\theta$, where $c_{\theta}^{\star}$ is the robust-optimal allocation with a robustness budget of $\theta$, and plot them against one another. That is, we plot $c_{\theta}^{\star T} \hat{\beta}$ against $p(c_{\theta}^\star)$ for a suitable range of $\theta$, providing a parametric assessment of the tradeoff between the two types of performance. This graph should be vital in understanding the tradeoffs involved and likely will point to a good choice for $\theta$.
Solving the Problem
We will rewrite the problem as: $$ \begin{array}{cl} \mbox{maximize${}_c$ minimize${}_\beta$} & f(c, \beta) \\\ \mbox{subject to} & A c \preceq a \\\ & \mathbf{1}^T c = C \\\ & h(\beta) \leq 0, \end{array} $$ where $f$ is a concave in $c$ and convex in $\beta$, and $h$ is convex. We enforce the inequality constraints with a barrier method, rewriting the objective as: $$ f_t(c, \beta) = t f(c, \beta) + \sum_{j} \log( a_j - A_j^T c) - \log(-h(\beta)), $$ where $A_j^T$ is the $j$th row of $A$. This objective is still concave in $c$ and convex in $\beta$ for all values $t$. We then solve a sequence of problems of the form: $$ \begin{array}{cl} \mbox{maximize${}_c$ minimize${}_\beta$} & f_t(c, \beta) \\\ \mbox{subject to} & \mathbf{1}^T c = C. \end{array} $$ In the limit $t \to \infty$, the values of these problems approach the value of the original problem. If $a \in \mathbb{R}^m$, then the sub-optimality of the problem indexed by $t$ is at most $(m + 1) / t$. (See the course notes, Minimax and Convex-Concave Games, by Arpita Ghosh and Stephen Boyd.)
We solve each problem in this sequence via the infeasible start Newton method, as described in Boyd and Vandenberghe’s book, Convex Optimization, section 10.3.4, increasing $t$ in each sequence until the sub-optimality, $(m+1)/t$ is sufficiently small.
Conclusion
Every A/B test involves uncertainty that becomes risk as soon as we make some decision based on the results. We can hedge this risk by “diversifying” our resources across multiple options. Calculating a robust-optimal portfolio is practical and fast.