[R] Limited number of principal components in PCA

David L Carlson dcarlson at tamu.edu
Mon Aug 1 18:51:50 CEST 2011


Providing the data will help, but the first thing I noted is that you have more columns (variables) than rows (cases). PCA will return a maximum of (the number of columns) or (the number of rows-1) whichever is less. With 84 columns and 66 rows means you can get no more than 65 components. If the variables are highly correlated, you will get fewer components and that probably explains the reduction to 54. I would guess the variables are highly correlated and the first eigenvalue is very large.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Joshua Wiley
Sent: Friday, July 29, 2011 10:20 PM
To: William Armstrong
Cc: r-help at r-project.org
Subject: Re: [R] Limited number of principal components in PCA

Hi Billy,

Can you provide your data?  You could attach it as a text file or
provide it by pasting the output of:

dput(Q)

into an email.  It would help if we could reproduce what you are
doing.  You might also consider a list or forum that is more
statistics oriented than Rhelp, as your questions are more related to
the statistics than the software itself (but still, if you give us
data, you will probably get farther).

Cheers,

Josh

On Fri, Jul 29, 2011 at 11:33 AM, William Armstrong
<William.Armstrong at noaa.gov> wrote:
> Hi all,
>
> I am attempting to run PCA on a matrix (nrow=66, ncol=84) using 'prcomp'
> (stats package).  My data (referred to as 'Q' in the code below) are
> separate river streamflow gaging stations (columns) and peak instantaneous
> discharge (rows).  I am attempting to use PCA to identify regions of that
> vary together.
>
> I am entering the following command:
>
> test_pca_Q<-prcomp(~.,data=Q,scale.=TRUE,retx=FALSE,na.action=na.omit)
>
> It is outputting 54 'standard deviation' numbers (which are the
> sqrt(eigenvalues) in respect to a certain PC, am I correct?), and 54
> 'rotation' numbers, which are the variable loadings with respect to a given
> PC.
>
> I have two questions:
>
> 1.) Why is it only outputting 54 PCs and standard deviations?  If I have 84
> variables isn't the maximum number of PCs I can create 84 as well?
>
> 2.) Can I now use the 'rotation' values to find clusters of gages that I
> acting together, or is there another step I must take?
>
> Thank you very much for your insight.
>
> Billy
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Limited-number-of-principal-components-in-PCA-tp3704956p3704956.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list