Robust Power Assessment

Aug 12, 2021 6 min read

An important part of planning any statistical experiment is power analysis. In this post I will focus on power analysis for linear regression models, but I am hopeful much of this can be applied to Generalized Linear Models and hence to the sorts of A/B tests I normally run.

Preliminaries

Suppose $y = X \beta + \epsilon$ where $y \in \mathbb{R}^n$, $\beta \in \mathbb{R}^p$, and all of the assumptions of linear regression are met, as described in our last post on Scheffé’s method. Suppose we are interested in testing a null hypothesis of the form $C \beta = d$, where $d \in \mathbb{R}^q$. Let $$ T_C(d) := \frac{1}{q} \left( C \hat{\beta} - d \right)^T \left( C \hat{\mathcal{I}}^{-1} C^T \right)^{-1} \left( C \hat{\beta} - d \right) = \frac{1}{q} \| C \hat{\beta} - d \|_{C \hat{\mathcal{I}} C^T}^2, $$ where $\hat{\beta} = (X^T X)^{-1} X^T y$ is the least squares estimate of $\beta$, and $\hat{\mathcal{I}} = X^T X / \hat{\sigma}^2$ be the estimated Fisher information, with $\hat{\sigma}^2 = \| y - X \hat{\beta} \|_2^2 / (n - p)$ the estimated variance of $\epsilon$.

As derived in [Wood17, §1.3.4], under the null hypothesis, $T_C(d) \sim F_{n-p}^q$, where $F_{n-p}^q$ is Snedecor’s $F$ distribution with $q$ and $n-p$ degrees of freedom in the numerator and denominator, respectively. Thus, $1 - \Phi_{F_{n-p}^q}(T_C)$ is a p-value against the null hypothesis $C \beta = d$, where $\Phi_{F_{n-p}^q}(\cdot)$ is the cumulative distribution function for $F_{n-p}^q$.

Under an alternative hypothesis, such as $C \beta = d^\prime$, $T_C(d)$ has a non-central $F$ distribution with non-centrality parameter $\lambda = \| d^\prime - d \|_{C \mathcal{I}^{-1} C^T}^2$, where $\mathcal{I} = X^T X / \sigma^2$ is the true Fisher information (that is, using the true variance $\sigma^2$ rather than the estimated $\hat{\sigma}^2$). For a derivation, see my notes on Linear Regression. Thus the power of the test against the alternative when rejecting at level $\alpha$ is $$ 1 - \Phi_{F_{n-p; \lambda}^q}\left( F_{n-p}^{q; 1 - \alpha} \right) $$ where $F_{n-p}^{q; 1 - \alpha}$ is the upper quantile of a (central) $F_{n-p}^q$ distribution.

Robust Power Assessment

The most obvious problem we run into when assessing power is: what alternative hypothesis should we choose? A slightly less obvious problem is: what should we assume for $\sigma^2$?

The first option is to use point estimates of $C \beta$ and $\sigma^2$, perhaps from previous experiments. A more robust approach, and the focus of this post, is to use the values in a neighborhood of the point estimates resulting in the worst-case power. We can then be confident that small errors in the point estimates do not derail our experiment.

The problem we solve is: $$ \begin{array}{cl} \mbox{minimize} & 1 - \Phi_{F_{n-p; \lambda}^q}\left( F_{n-p}^{q; 1 - \alpha} \right) \\\ \mbox{subject to} & \beta \in S_\beta \\\ & \sigma^2 \in S_{\sigma^2} \end{array} $$ where $S_\beta$ and $S_{\sigma^2}$ are confidence regions on $\beta$ and $\sigma^2$, respectively (to be discussed below). The variables in this problem are $\beta$ and $\sigma^2$, which enter the objective function via the non-centrality parameter $\lambda$.

First note that the objective is an increasing function of $\lambda$. This follows from the nature of the non-central $F$ distribution: the larger the non-centrality parameter, the easier it is to reject the null hypothesis and the greater the power will be. So an equivalent problem is: $$ \begin{array}{cl} \mbox{minimize} & \lambda = \| d^\prime - d \|_{C \mathcal{I}^{-1} C^T}^2 \\\ \mbox{subject to} & \beta \in S_\beta \\\ & \sigma^2 \in S_{\sigma^2}, \end{array} $$ which can be more conveniently expressed as $$ \begin{array}{cl} \mbox{minimize} & \frac{( C \beta - d )^T \left( C (X^T X)^{-1} C^T \right)^{-1} ( C \beta - d )}{\sigma^2} \\\ \mbox{subject to} & \beta \in S_\beta \\\ & \sigma^2 \in S_{\sigma^2}. \end{array} $$ The objective is a convex (quadratic-over-linear) function of $(\beta, \sigma^2)$, so if $S_\beta$ and $S_{\sigma^2}$ are convex regions, we have a convex optimization problem that can be efficiently solved.

If we have data from a previous experiment, then $$ S_\beta(\alpha) = \left\{ \beta^\prime : (\hat{\beta}_{\textrm{prev}} - \beta^\prime)^T X_{\textrm{prev}}^T X_{\textrm{prev}} (\hat{\beta}_{\textrm{prev}} - \beta^\prime) \leq \sigma^2 \cdot \chi_{p}^{2; 1 - \alpha} \right\} $$ is such a region for $\beta$. (It may be possible to get a tighter region, especially when $q \ll p$, but I leave that to the reader to figure out.) Similarly, $$ S_{\sigma^2}(\alpha) = \left\{ \sigma^{\prime 2} : 0 \leq \sigma^{\prime 2} \leq \frac{(n_{\textrm{prev}}-p) \hat{\sigma}_{\textrm{prev}}^2 }{ \chi_{n_{\textrm{prev}}-p}^{2; 1 - \alpha} } \right\}, $$ where $\chi_{n_{\textrm{prev}}-p}^{2; 1 - \alpha}$ is the upper quantile of a $\chi_{n_{\textrm{prev}}-p}^2$ distribution.

With these regions, the problem becomes $$ \begin{array}{cl} \mbox{minimize} & \frac{( C \beta - d )^T \left( C (X^T X)^{-1} C^T \right)^{-1} ( C \beta - d )}{\sigma^2} \\\ \mbox{subject to} & \| X_{\textrm{prev}} (\hat{\beta}_{\textrm{prev}} - \beta) \|_2^2 \leq \sigma^2 \cdot \chi_{p}^{2; 1 - \alpha} \\\ & 0 \leq \sigma^2 \leq \frac{(n_{\textrm{prev}}-p) \hat{\sigma}_{\textrm{prev}}^2 }{ \chi_{n_{\textrm{prev}}-p}^{2; 1 - \alpha} }, \end{array} $$ with variables $\beta$ and $\sigma^2$; the other parameters are data. The value of this problem, $\lambda^\star$, is the worst-case noncentrality parameter, which in turn can be plugged into the power formula to calculate the worst-case power.

In this case, it is plain that the optimal $\sigma^2$ is the one equal to its upper bound. The problem is then a quadratic program with a quadratic inequality constraint.

Probabilistic Interpretation

When we use these regions for $\beta$ and $\sigma^2$, our robust approach has a probabilistic interpretation. Let $\lambda^\star$ be the worst-case non-centrality parameter. Then $P\{ \lambda \geq \lambda^\star \} \geq (1 - \alpha)^2 \geq 1 - 2 \alpha$ using a similar derivation as we used with Scheffé’s method. This then provides a method of guaranteeing a certain minimum power with any probability we wish.

Proof: $$ \begin{align*} P\{ \lambda \geq \lambda^\star \} &\geq P\{ \lambda \geq \lambda^\star \; | \; \beta \in S_\beta(\alpha), \sigma^2 \in S_{\sigma^2}(\alpha) \} \\ &\phantom{\geq} \times P\{ \beta \in S_\beta(\alpha), \sigma^2 \in S_{\sigma^2}(\alpha) \} \\ &= 1 \cdot P\{ \beta \in S_\beta(\alpha) \} \cdot P\{ \sigma^2 \in S_{\sigma^2}(\alpha) \} \\ &= (1 - \alpha) \cdot (1 - \alpha). \end{align*} $$ The key insight here is that the confidence regions on $\beta$ and $\sigma^2$ are independent since $S_\beta(\alpha)$ is based solely on $\hat{\beta}_{\textrm{prev}}$, and $S_{\sigma^2}(\alpha)$ is based solely on $\hat{\sigma}_{\textrm{prev}}^2$. An important result in linear regression is that $\hat{\beta}_{\textrm{prev}}$ is statistically independent of $\hat{\sigma}_{\textrm{prev}}^2$.

Post-Hoc Power Analysis

Sometimes I see people calculate power after performing an A/B test. This is bad! Post-hoc power analysis tends to exaggerate our confidence in the sensitivity of the test. The robust approach outlined above mitigates this at least partially, and could be a viable approach for post-hoc power analysis.

References

[Wood17] Simon N. Wood, “Generalized Additive Models: An Introduction with R”. Chapman & Hall, 2nd edition, 2017.

Bob Wilson

Marketing Data Scientist

The views expressed on this blog are Bob’s alone and do not necessarily reflect the positions of current or previous employers.

Robust Power Assessment

Preliminaries

Robust Power Assessment

Probabilistic Interpretation

Post-Hoc Power Analysis

References

Subscribe to Adventures in Why

Bob Wilson

Marketing Data Scientist