[R] prcomp: where do sdev values come from?

Adams, Sky sky_adams at brown.edu
Wed Jun 20 22:14:51 CEST 2012


In the manual page for prcomp(), it says that sdev is "the standard
deviations of the principal components (i.e., the square roots of the
eigenvalues of the covariance/correlation matrix, though the
calculation is actually done with the singular values of the data
matrix)."  However, this is not what I'm finding.  The values appear
to be the standard deviations of a reprojection of the data, as in the
example below.

> test=matrix(data=c(1,1,2,2,0,0,2,1,2,1,0,2,1,2,1,2,2,0),nrow=3,ncol=6)
> test=standardize(test)
> test
           [,1]      [,2]       [,3] [,4]       [,5]      [,6]
[1,] -0.7071068  2.828427  0.8944272    0 -0.7071068  1.414214
[2,] -0.7071068 -1.414214 -1.7888544   -2  1.4142136  1.414214
[3,]  1.4142136 -1.414214  0.8944272    2 -0.7071068 -2.828427

> res1
Standard deviations:
[1] 3.622043e+00 2.877639e+00 1.812987e-16

Rotation:
            PC1         PC2        PC3
[1,]  0.3330598 -0.07347383  0.1418171
[2,] -0.2319540  0.79957999  0.3837549
[3,]  0.2745904  0.41276093 -0.8541541
[4,]  0.5186794  0.23838204  0.2096053
[5,] -0.2170828 -0.32631616 -0.1153013
[6,] -0.6661196  0.14694766 -0.2140375

> res2=svd(test)
>res2
$d
[1] 5.122342e+00 4.069596e+00 1.153778e-15

$u
           [,1]       [,2]      [,3]
[1,] -0.2800491  0.7669675 0.5773503
[2,] -0.5241888 -0.6260134 0.5773503
[3,]  0.8042379 -0.1409541 0.5773503

$v
           [,1]        [,2]       [,3]
[1,]  0.3330598 -0.07347383  0.7608840
[2,] -0.2319540  0.79957999  0.3438537
[3,]  0.2745904  0.41276093 -0.4028185
[4,]  0.5186794  0.23838204  0.1543444
[5,] -0.2170828 -0.32631616  0.3236952
[6,] -0.6661196  0.14694766  0.1093469

Here, the values in $d are not the squares of the standard deviations.

> res3=test %*% res2$v
> res3
          [,1]       [,2]          [,3]
[1,] -1.434507  3.1212479 -7.494005e-16
[2,] -2.685074 -2.5476217  3.996803e-15
[3,]  4.119582 -0.5736263 -2.053913e-15
> apply(res3,2,sd)
[1] 3.622043e+00 2.877639e+00 3.184320e-15

As shown above, taking the standard deviation of the columns of the
reprojection of the eigenvectors on the original "data" gives the same
values as the standard deviations given by prcomp().  Can anyone
reconcile this example with the manual page for prcomp() or explain
how the standard deviations are calculated?

the code for standardize():

standardize<-function(data){
  numSamples = nrow(data);

  # calculate mean for each marker
  mn=apply(data,2,mean);
  numMarkers = ncol(data);
  #standardize every entry
  for(j in 1:numMarkers){
    for(k in 1:numSamples){
      temp = sqrt(mn[j]/2*(1-mn[j]/2));
      data[k,j] = (data[k,j] - mn[j])/temp;
    }
  }
  data;
}

Thank you,
Sky



More information about the R-help mailing list