[R] Question about PCA with prcomp

Bill.Venables at csiro.au Bill.Venables at csiro.au
Mon Jul 2 23:03:02 CEST 2007

...but with 500 variables and only 20 'entities' (observations) you will
have 481 PCs with dead zero eigenvalues.  How small is 'smaller' and how
many is "a few"?

Everyone who has responded to this seems to accept the idea that PCA is
the way to go here, but that is not clear to me at all.  There is a
2-sample structure in the 20 observations that you have.  If you simply
ignore that in doing your PCA you are making strong assumptions about
sampling that would seem to me unlikely to be met.  If you allow for the
structure and project orthogonal to it then you are probably throwing
the baby out with the bathwater - you want to choose variables which
maximise separation between the 2 samples (and now you are up to 482
zero principal variances, if that matters...).

I think this problem probably needs a bit of a re-think.  Some variant
on singular LDA, for example, may be a more useful way to think about

Bill Venables.  

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ravi Varadhan
Sent: Monday, 2 July 2007 1:29 PM
To: 'Patrick Connolly'
Cc: r-help at stat.math.ethz.ch; 'Mark Difford'
Subject: Re: [R] Question about PCA with prcomp

The PCs that are associated with the smaller eigenvalues. 


Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu




-----Original Message-----
From: Patrick Connolly [mailto:p_connolly at ihug.co.nz]
Sent: Monday, July 02, 2007 4:23 PM
To: Ravi Varadhan
Cc: 'Mark Difford'; r-help at stat.math.ethz.ch
Subject: Re: [R] Question about PCA with prcomp

On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote:

|> Mark,
|> What you are referring to deals with the selection of covariates, 
|> since
|> doesn't do dimensionality reduction in the sense of covariate
|> But what Mark is asking for is to identify how much each data point 
|> contributes to individual PCs.  I don't think that Mark's query makes
|> sense, unless he meant to ask: which individuals have high/low scores

|> on PC1/PC2.  Here are some comments that may be tangentially related 
|> to
|> question:
|> 1.  If one is worried about a few data points contributing heavily to

|> the estimation of PCs, then one can use robust PCA, for example, 
|> using robust covariance matrices.  MASS has some tools for this.
|> 2.  The "biplot" for the first 2 PCs can give some insights 3. PCs, 
|> especially, the last few PCs, can be used to identify "outliers".

What is meant by "last few PCs"?


   ___    Patrick Connolly   
 {~._.~}          		 Great minds discuss ideas    
 _( Y )_  	  	        Middle minds discuss events 
(:_~*~_:) 	       		 Small minds discuss people  
 (_)-(_)  	                           ..... Anon

R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list