[R] R on Large Data Sets (again)

Duncan Murdoch murdoch at stats.uwo.ca
Sun Nov 29 14:55:39 CET 2009


On 28/11/2009 6:53 PM, Lars Bishop wrote:
> Dear R users,
> 
> I’ve search the R site for help on this topic but it is hard to find a
> precise answer for my questions.
> 
> Which are the best options to overcome the RAM memory limitation problems
> when using R on “large” data sets (such as 2 or 3 million records)?

There are several packages for handling datasets without keeping them in 
RAM:  bigmemory, ff, etc.  You may find that you need to write functions 
to handle your data a block at a time, or you may find they have already 
been written, e.g. biglm.  You can also keep your data in a database and 
just retrieve it a block at a time for processing.

> 
> -          Is the free available version of R (as opposed to the one
> provided by REvolution Computing) compatible with a windows 64-bit machine?
> And if I increase the RAM memory enough on win-64, would this virtually
> solve my memory limitation problems?

It is compatible with Win64, but it is a 32 bit application.  It 
benefits from running on 64 bit Windows (because Windows can get out of 
the way and give it most of 4 GB to work in), but not as much as a true 
64 bit application.  So it probably doesn't solve your problem.


> -          Is a Unix-like platform a better option than win-64? Again, would
> this solve my memory limitation problems?

There are builds available for 64 bit Linux and MacOS (and maybe 
others); they'd likely help more than running 32 bit R in Win64.  I 
don't know how they compare to running Revolution's 64 bit R in Win64.

Duncan Murdoch

> 
> 
> 
> -          Any better option?
> Thanks in advance for your help,
> Lars.
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list