[R] Doubt about CCA and PCA

Jari Oksanen jari.oksanen at oulu.fi
Tue Nov 24 08:15:16 CET 2009


Jombart, Thibaut <t.jombart <at> imperial.ac.uk> writes:

> 
> Dear Francisco, 
> 
> CCA and PCA are quite different methods. CCA regresses your 'response' data
onto a set of explanatory
> variables. This needs to invert the matrix of covariances of the predictors,
which is only possible if
> n>p, where n is the number of observations and p the number of explanatory
variables.
> 
> PCA is defined in any case. The ratio between n and p is then relevant only if
you intend to infer principal
> axes / component of the population (as opposed to using the PA/PC as mere
descriptors of the sample). I
> would recommend reading :
> Joliffe, I. T. Principal Component Analysis Springer, 2004
> which tackles the latter point very clearly.
> 
>
> Dear R community,
> 
> I'm working with PCA and CCA methods, and I have a theoretical question.
> 
> Why is it necesary to have more temporal values than variables when the CCA
> O PCA are going to be used?
> 
> Could you advise to me some any paper about it?
> 

Francisco,

First assumption: "temporal values" refers to the number of rows. With that
assumption, it is *not* necessary to have more rows than columns in PCA (more
about CCA below). It depends on the implementation, and in R function prcomp()
is implemented so that this is not necessary whereas princomp() is implemented
so that you indeed need more rows (observations) than columns (variables). The
number of eigenvalues will be less than number of variables if you have rank
deficit data with lower number of rows than columns.

Then about CCA. First thing is that you should tell us what is CCA. This is an
ambiguous acronym which usually refers either to constrained ("canonical")
correspondence analysis or canonical correlation analysis. The first is simpler
and does not have the constraint you mentioned, but the latter is
computationally more complicated and may need a special implementation for rank
deficit data. There are further complications, but I won't guess anything about
them before I get more details. 

Cheers, Jari Oksanen




More information about the R-help mailing list