[R] prcomp - principal components in R

Tony Plate tplate at acm.org
Mon Nov 9 20:26:19 CET 2009


The output of summary prcomp displays the cumulative amount of variance explained relative to the total variance explained by the principal components PRESENT in the object.  So, it is always guaranteed to be at 100% for the last principal component present.  You can see this from the code in summary.prcomp() (see this code with getAnywhere("summary.prcomp")).

Here's how to get the output you want (the last line in the transcript below):

> set.seed(1)
> summary(pc1 <- prcomp(x))
Importance of components:
                         PC1   PC2   PC3   PC4   PC5
Standard deviation     1.175 1.058 0.976 0.916 0.850
Proportion of Variance 0.275 0.223 0.190 0.167 0.144
Cumulative Proportion  0.275 0.498 0.688 0.856 1.000
> summary(pc2 <- prcomp(x, tol=0.8))
Importance of components:
                        PC1   PC2   PC3
Standard deviation     1.17 1.058 0.976
Proportion of Variance 0.40 0.324 0.276
Cumulative Proportion  0.40 0.724 1.000
> pc2$sdev
[1] 1.1749061 1.0581362 0.9759016
> pc1$sdev
[1] 1.1749061 1.0581362 0.9759016 0.9164905 0.8503122
> svd(scale(x, center=T, scale=F))$d / sqrt(nrow(x)-1)
[1] 1.1749061 1.0581362 0.9759016 0.9164905 0.8503122
> cumsum(pc1$sdev^2) / sum((svd(scale(x, center=T, scale=F))$d / sqrt(nrow(x)-1))^2)
[1] 0.2752317 0.4984734 0.6883643 0.8558386 1.0000000
> 
> # output in terms of the cumulative % of the total variance
> cumsum(pc2$sdev^2) / sum((svd(scale(x, center=T, scale=F))$d / sqrt(nrow(x)-1))^2)
[1] 0.2752317 0.4984734 0.6883643
> 

It's probably better to get prcomp to compute all the components in the first place, because the SVD is the bulk of the computation anyway (so doing it again will be slower for large matrices.)  Then just look at the most important principal components.  However, there may be a shortcut for computing the values of D in the SVD of a matrix -- you could look for that if you have demanding computations (e.g., the sqrts of the eigen values of the covariance matrix of scaled x: sqrt(eigen(var(scale(x, center=T, scale=F)), only.values=T)$values)).

-- Tony Plate


zubin wrote:
> Hello, not understanding the output of prcomp, I reduce the number of 
> components and the output continues to show cumulative 100% of the 
> variance explained, which can't be the case dropping from 8 components 
> to 3. 
> 
> How do i get the output in terms of the cumulative % of the total 
> variance, so when i go from total solution of 8 (8 variables in the data 
> set), to a reduced number of components, i can evaluate % of variance 
> explained, or am I missing something??
> 
> 8 variables in the data set
> 
>  > princ = prcomp(df[,-1],rotate="varimax",scale=TRUE)
>  > summary(princ)
> Importance of components:
>                          PC1   PC2   PC3   PC4   PC5   PC6    PC7    PC8
> Standard deviation     1.381 1.247 1.211 0.994 0.927 0.764 0.6708 0.4366
> Proportion of Variance 0.238 0.194 0.183 0.124 0.107 0.073 0.0562 0.0238
> Cumulative Proportion  0.238 0.433 0.616 0.740 0.847 0.920 0.9762 *1.0000*
> 
>  > princ = prcomp(df[,-1],rotate="varimax",scale=TRUE,tol=.75)
>  > summary(princ)
> 
> Importance of components:
>                          PC1   PC2   PC3
> Standard deviation     1.381 1.247 1.211
> Proportion of Variance 0.387 0.316 0.297
> Cumulative Proportion  0.387 0.703 *1.000*
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list