[R] Principal Component Analysis - Selecting components? + right choice?
dray at biomserv.univ-lyon1.fr
Thu Dec 11 13:30:46 CET 2008
You can have look to
*S. Dray*. On the number of principal components: A test of
dimensionality based on measurements of similarity between matrices.
/Computational Statistics and Data Analysis/, 52:2228-2237, 2008.
which is implemented in the testdim function of the ade4 package.
> Dear R gurus,
> I have some climatic data for a region of the world. They are monthly averages
> 1950 -2000 of precipitation (12 months), minimum temperature (12 months),
> maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and
> I have around 75,000 cells.
> I need to feed them into a statistical model as co-variates, to use them to
> predict a response variable.
> The climatic data are obviously correlated: precipitation for January is
> correlated to precipitation for February and so on .... even precipitation
> and temperature are heavily correlated. I did some correlation analysis and
> they are all strongly correlated.
> I though of running PCA on them, in order to reduce the number of co-variates
> I feed into the model.
> I run the PCA using prcomp, quite successfully. Now I need to use a criteria
> to select the right number of PC. (that is: is it 1,2,3,4?)
> What criteria would you suggest?
> At the moment, I am using a criteria based on threshold, but that is highly
> subjective, even if there are some rules of thumb (Jolliffe,Principal
> Component Analysis, II Edition, Springer Verlag,2002).
> Could you suggest something more rigorous?
> By the way, do you think I would have been better off by using something
> different from PCA?
Stéphane DRAY (dray at biomserv.univ-lyon1.fr )
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88
More information about the R-help