[R] Principal Component Analysis - Selecting components? + right choice?

Stéphane Dray dray at biomserv.univ-lyon1.fr
Thu Dec 11 13:30:46 CET 2008

You can have look to

*S. Dray*. On the number of principal components: A test of 
dimensionality based on measurements of similarity between matrices. 
/Computational Statistics and Data Analysis/, 52:2228-2237, 2008.

which is implemented in the testdim function of the ade4 package.


Corrado wrote:
> Dear R gurus,
> I have some climatic data for a region of the world. They are monthly averages 
> 1950 -2000 of precipitation (12 months), minimum temperature (12 months), 
> maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and 
> I have around 75,000 cells.
> I need to feed them into a statistical model as co-variates, to use them to 
> predict a response variable.
> The climatic data are obviously correlated: precipitation for January is 
> correlated to precipitation for February and so on .... even precipitation 
> and temperature are heavily correlated. I did some correlation analysis and 
> they are all strongly correlated.
> I though of running PCA on them, in order to reduce the number of co-variates 
> I feed into the model.
> I run the PCA using prcomp, quite successfully. Now I need to use a criteria 
> to select the right number of PC. (that is: is it 1,2,3,4?)
> What criteria would you suggest?
> At the moment, I am using a criteria based on threshold, but that is highly 
> subjective, even if there are some rules of thumb (Jolliffe,Principal 
> Component Analysis, II Edition, Springer Verlag,2002). 
> Could you suggest something more rigorous?
> By the way, do you think I would have been better off by using something 
> different from PCA?
> Best,

Stéphane DRAY (dray at biomserv.univ-lyon1.fr )
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
Tel: 33 4 72 43 27 57       Fax: 33 4 72 43 13 88

More information about the R-help mailing list