[R] PCA in Microarrays

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed May 14 17:48:25 CEST 2008


On Wed, 14 May 2008, Jorge Ivan Velez wrote:

> Dear useRs:
> I'm not sure if it's the correct place to ask but I'll try it out. I've been
> reading about how to perform Principal Component Analysis (PCA) in
> microarrays (see [1]) and there's something that I don't get it. Basically
> it's related with performing PCA over data sets which number of variables is
> greater than the number of samples. For example in the paper mentioned
> above, the number of variables (genes) and samples (tumors) is 8538 and 104,
> respectively. My understanding is that, in PCA, the number of samples (n)
> must be greater than the number of variables (p) and its goal is to seek k
> components, such as k<p and the variance in this new data set be
> maximized. Am I wrong?

Yes, in detail. One of the properties of PCA is to seek projections 
(unit-length linear combinations of the variables) of maximal variance, 
each being uncorrelated with earlier ones.  That is well-defined for n < 
p.  But you will only get at most n PCs of non-zero variance (and at most 
n-1 unless you centre externally), and the rest are pretty arbitrary basis 
vectors for the space of constant combinations.

> Could somebody please tell me how is possible to perform PCA when the 
> number of variables is greater than the number of samples and how to do 
> it in R?  I'm really confused.  In R I've tried "prcomp" and "princomp" 
> but they didn't work.

See any good book on multivariate analysis, or your statistical 
consultant.  (See the posting guide as to why this is not the list on 
which to ask that question.)

That you can do this does not make it sensible, but it can be 
interpretable if there is a strong signal associated with a handful of 
genes -- but then so can other methods.

And BTW, prcomp() *does* work, e.g.

X <- matrix(rnorm(20*200), 20)
fit <- prcomp(X)
str(fit)

so the problem is what you did (and you didn't manage to tell us what that 
was -- see the footer of the message).  ?princomp does tell you to use 
prcomp() in this case.

> I'm using Win XP SP2, Intel Core- 2 Duo 2.4 GHz and R 2.7.0 Patched.
>
>
> Thanks in advance,
>
>
> Jorge Ivan Velez
>
>
>
> [1] Ringn?r, M.  What is principal components analysis? Nature Biotechnology
> 26, 303 - 304 (2008),
> http://www.nature.com/nbt/journal/v26/n3/full/nbt0308-303.html

Hmm, that's not a free resource.

>
> 	[[alternative HTML version deleted]]
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list