[R] sparse PCA using nsprcomp package

Mon Sep 9 17:06:48 CEST 2013

Hi John

> 1). Assume now I can calculate these "adjusted" standard deviation from sparse PCA, should the percent variation explained by each sparse PC be calculated using the sum of all these "adjusted" variance (i.e. square of the "adjusted" standard deviation) as the denominator (then these percent variation explained will always add up to 1 if all sparse PCs are counted, or using the sum of the PC variances estimated by REGULAR PCA as the denominator (then, adding up all PCs may not be equal to 1)?

It depends on what you want to do with this percentage, but to me the second would be more meaningful. A sparse PCA will usually be truncated (fewer than all possible components are computed), and due to the additional constraints on the principal axes you will usually explain less variance than with standard PCA. I would want to know what I lose in a sparse PCA w.r.t. a standard PCA.

Note that you don't actually have to compute the standard PCA if you are only interested in the total variance of the data, i.e. the sum of all variances. The total variance 

1/(n-1)*sum(diag(t(X)%*%X)) 

for the zero-mean data matrix X is invariant to a rotation of the coordinate system and therefore identical to

Z <- X%*%W
1/(n-1)*sum(diag(t(Z)%*%Z)) 

so you can skip computing the PCA rotation matrix W. The fastest way to compute the total variance is probably

1/(n-1)*sum(X^2) 

because all expressions compute the squared Frobenius norm of X. 

If you want to compare variances of individual components, then compute a regular PCA.

I also had a look how the spca function computes the "percentage explained variation". I don't yet entirely understand what is going on, but the results differ from using the "asdev" function I mentioned in my previous reply. Keep that in mind if you want to compare nsprcomp to spca.

> 2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? If for example, my regular PCA showed that I need 20 PCs to account for 80% of the variation in my dataset, does it mean I should set ncomp=20? And then what about any rules setting the value of "k"?

I don't have any hard answers for this question. 

There are a number of heuristics for choosing the number of components in regular PCA (e.g. the PCA book by Jolliffe presents several), and some of them should translate to sparse PCA. If you think that 20 PCs or 80% explained variance works well for regular PCA, I suggest also using 20 components in sparse PCA, then measure the explained variance and then increase the number of components (if necessary) to again achieve 80% explained variance.

Same for setting the cardinality parameter k. You could use a criterion such as BIC to optimize the trade-off between model fidelity and complexity, but I don't have any experience how well this works in practice. What I did so far was to check for loadings with small magnitudes. I set k such that all loadings have "substantial" magnitudes.

In the end what matters is what follows after running the algorithm. Are you directly interpreting the sparse PCA result, or is this an intermediate step in a complete data processing pipeline with a measurable goal? If the latter, choose the algortihm parameters that give you the best results at the end of the pipeline.

> 3). Would you recommend nscumcomp() or nsprcomp() in general?

It depends whether you want to perform a sequential or cumulative analysis of your data. If you want maximum variance in the first (second, third, ...) PC, and specify the precise cardinality of each principal axis, then use nsprcomp. If instead you want to only specify the total cardinality of all loadings and leave the distribution of non-zero loadings to the algorithm, use nscumcomp.

There will be substantial improvements to nscumcomp in the next release, if you want to use it I suggest you wait until then.

Regards
Christian