[R] Principal component analysis PCA

SNN s.nancy1 at yahoo.com
Thu Feb 14 21:27:51 CET 2008


Thanks for the advice.

I tried to find the cov of my matrix using R and it ran out of memory. I am
not sure how to do double loop to create the covariace matrix?  Also is
doing prcomp( covariace matrix) the same as finding 
prcomp( original data ,matrix of snps)?

Thanks for your help,




Thomas Lumley wrote:
> 
> On Wed, 13 Feb 2008, Wang, Zhaoming (NIH/NCI) [C] wrote:
> 
>>
>> Try EIGENSTRAT http://www.nature.com/ng/journal/v38/n8/abs/ng1847.html
> 
> The same approach as EIGENSTRAT is pretty straightforward in R.
> 
> You need to create the covariance matrix of people (rather than of SNPs) 
> for the 0/1/2 genotype at each SNP and take the principal components of 
> that matrix.
> 
> In this case the number of individuals is small enough that you should be 
> able to create the covariance matrix directly by matrix operations.  In 
> larger data sets where the entire data matrix doesn't fit in memory, you 
> need some sort of double loop.
> 
>  	-thomas
> 
> 
>> Zhaoming
>> -----Original Message-----
>> From: SNN [mailto:s.nancy1 at yahoo.com]
>> Sent: Wednesday, February 13, 2008 9:14 PM
>> To: r-help at r-project.org
>> Subject: [R] Principal component analysis PCA
>>
>>
>> Hi,
>>
>> I am trying to run PCA on a set of data with dimension 115*300,000. The
>> columns represnt the snps and the row represent the individuals. so this
>> is what i did.
>>
>> #load the data
>>
>> code<-read.table("code.txt", sep='\t', header=F, nrows=300000)
>>
>> # do PCA #
>>
>> pr<-prcomp(code, retx=T, center=T)
>>
>> I am getting the following error message
>>
>> "Error: cannot allocate vector of size 275.6 Mb"
>>
>> I tried to increase the memory size :
>>
>> "memory.size(4000)"
>>
>> but it did not work, is there a solution for this ? or is there another
>> software that can handle large data sets.
>>
>> Thanks
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Principal-component-analysis-PCA-tp15472509p154725
>> 09.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Principal-component-analysis-PCA-tp15472509p15488659.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list