[R] Memory problem on a linux cluster using a large data set [Broadcast]

Martin Morgan mtmorgan at fhcrc.org
Thu Dec 21 18:07:01 CET 2006


Section 8 of the Installation and Administration guide says that on
64-bit architectures the 'size of a block of memory allocated is
limited to 2^32-1 (8 GB) bytes'.

The wording 'a block of memory' here is important, because this sets a
limit on a single allocation rather than the memory consumed by an R
session. The size of the allocation of the original poster was
something like 300,000 SNPs x 1000 individuals x 8 bytes (depending on
representation, I guess) = about 2.3 GB so there is still some room
for even larger data.

Obviously it's important to think carefully about how the statistical
analysis of such a large volume of data will proceed, and be
interpreted.

Martin

Thomas Lumley <tlumley at u.washington.edu> writes:

> On Thu, 21 Dec 2006, Iris Kolder wrote:
>
>> Thank you all for your help!
>>
>> So with all your suggestions we will try to run it on a computer with a 
>> 64 bits proccesor. But i've been told that the new R versions all work 
>> on a 32bits processor. I read in other posts that only the old R 
>> versions were capable of larger data sets and were running under 64 bit 
>> proccesors. I also read that they are adapting the new R version for 64 
>> bits proccesors again so does anyone now if there is a version available 
>> that we could use?
>
> Huh?  R 2.4.x runs perfectly happily accessing large memory under Linux on 
> 64bit processors (and Solaris, and probably others). I think it even works 
> on Mac OS X now.
>
> For example:
>> x<-rnorm(1e9)
>> gc()
>               used   (Mb) gc trigger   (Mb)   max used   (Mb)
> Ncells     222881   12.0     467875   25.0     350000   18.7
> Vcells 1000115046 7630.3 1000475743 7633.1 1000115558 7630.3
>
>
>          -thomas
>
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin T. Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the R-help mailing list