Thoughts on Principal Components Analysis

Nov 14, 2020 4 min read

This is a post with more questions than answers.

I’ve been thinking about Principal Component Analysis (PCA) lately. Suppose we have $m$ measurements each on $N$ units, represented in a matrix $Y \in \mathbb{R}^{N \times m}$, where $m \ll N$. The idea in PCA is that the results of these $m$ measurements are driven by $k$ underlying factors ($k < m$). So each unit $i$ can be described by a vector $a_i \in \mathbb{R}^k$, and each measurement $j$ is affected by the factors in a way described by a vector $b_j \in \mathbb{R}^k$, so that the $j$th measurement on the $i$th unit is $a_i^T b_j$. In this case, $Y = A B^T$ has rank $k$, where $A \in \mathbb{R}^{N \times k}$ has $a_i$ as the $i$th row, and $B \in \mathbb{R}^{m \times k}$ has $b_j$ as the $j$th row. This representation is not unique, because if $C$ is any invertible matrix in $\mathbb{R}^{k \times k}$, then $A \to A C$ and $B \to B C^{-T}$ leads to the same matrix $Y$.

So let’s use the (reduced) singular value decomposition (SVD), which represents $Y$ as the product of three matrices, $U \Sigma V^T$, where $U \in \mathbb{R}^{N \times k}$ and $V \in \mathbb{R}^{m \times k}$ have orthonormal columns, and $\Sigma \in \mathbb{R}^{k \times k}$ is diagonal. We’ll define $A = U \Sigma$ and $B = V$.

Now what we’d really like is to recover the vectors $a_i$. This tells us everything we care to know about the units. All we need for this is the matrix $V$, since $YV = U \Sigma V^T V = U \Sigma = A$, and we can just read off the rows. This is all standard PCA stuff.

The part that is interesting to me is when we only observe a random sample of the rows of Y, which are in a matrix $X \in \mathbb{R}^{n \times m}$, $n \ll N$. I’m wondering under what circumstances we can learn about $k$ and $V$ from $X$.

For example, I’m guessing that if $n \gg k$, then we can compute the SVD of $X$, look at how many singular values are meaningfully greater than zero, and get a good estimate of $k$. I’m guessing the right singular vectors we so calculate are a pretty good estimate of $V$. But I’d love to have some statistical rigor behind this.

Question 1: suppose we have a null hypothesis that $k = k_0$ (recall that $Y$ has rank $k$). How do we use $X$ to calculate a p-value against this null hypothesis? How can we get a confidence interval on $k$?

Question 2: how can we get a confidence region on $V$? If we had this, then we could directly calculate a confidence region on $XV$, the unit vectors for which we observe measurements. How does not knowing $k$ affect this procedure?

I’ve had a glance in Jolliffe’s Principal Component Analysis, and it seems that when the rows of $Y$ have a multivariate normal distribution, and $k$ is known, we can do this sort of thing. But I’m specifically interested in inferences regarding the parameter $k$. And I’d much prefer a nonparametric approach.

As I said, I have no idea how to do this, it just seems interesting!

Update 2020-11-15

The reason I’ve been thinking about this is that certain aspects of PCA seem a little arbitrary to me. I’m no expert, but the general guidance I’ve seen on selecting the parameter $k$ is based on either a “knee in the curve” in the singular values, or a threshold like getting 80% of the “energy” of the singular values. Typically PCA is used as an unsupervised learning technique. There is no “correct” answer on how many singular values to keep. I was trying to imagine a scenario where there actually is a correct answer, and this was what I came up with.

Bob Wilson

Marketing Data Scientist

The views expressed on this blog are Bob’s alone and do not necessarily reflect the positions of current or previous employers.

Thoughts on Principal Components Analysis

Update 2020-11-15

Subscribe to Adventures in Why

Bob Wilson

Marketing Data Scientist