[R] how to tell if its better to standardize your data matrix first when you do principal

Uwe Ligges ligges at statistik.tu-dortmund.de
Sun Nov 22 16:22:02 CET 2009


masterinex wrote:
> 
> 
> Hi guys , 
> 
> Im trying to do principal component analysis in R . There is 2 ways of doing
> it , I believe. 
> One is doing  principal component analysis right away the other way is 
> standardizing the matrix first  using s = scale(m)and then apply principal
> component analysis.   
> How  do I tell what result is better ? What values in particular should i
> look at . I already managed to find the eigenvalues and eigenvectors , the
> proportion of  variance for each eigenvector using both methods.
> 

Generally, it is better to standardize. But in some cases, e.g. for the 
same units in your variables indicating also the importance, it might 
make sense not to do so.
You should think about the analysis, you cannot know which result is 
`better' unless you know an interpretation.



> I noticed that the proportion of the variance for the first  pca without
> standardizing had a larger  value . Is there a meaning to it ? Isnt this
> always the case?
>  At last , if I am  supposed to predict a variable ie weight should I drop
> the variable ie weight from my data matrix when I do principal component
> analysis ?


This sounds a bit like homework. If that is the case, please ask your 
teacher rather than this list.
Anyway, it does not make sense to predict weight using a linear 
combination (principle component) that contains weight, does it?

Uwe Ligges




More information about the R-help mailing list