[R] memory limit
macrakis at alum.mit.edu
Wed Nov 26 22:16:41 CET 2008
I routinely compute with a 2,500,000-row dataset with 16 columns,
which takes 410MB of storage; my Windows box has 4GB, which avoids
thrashing. As long as I'm careful not to compute and save multiple
copies of the entire data frame (because 32-bit Windows R is limited
to about 1.5GB address space total, including any intermediate
results), R works impressively well and fast with this dataset for
selections, calculations, cross-tabs, plotting, etc. For example,
simple single-column statistics and cross-tabs take << 1 sec., summary
of the whole thing takes 16 sec. A linear regression between two
numeric columns takes < 20 sec. Plotting of all 2.5M points takes a
while, but that is no surprise (and is usually pointless [sic]
anyway). I have not tried to do any compute-intensive statistical
calculations on the whole data set.
The main (but minor) annoyance with it is that it takes about 90 secs
to load into memory using R's native binary "save" format, so I tend
to keep the process lying around rather than re-starting and
re-loading for each analysis. Fortunately, garbage collection is very
effective in reclaiming unused storage as long as I'm careful to
remove unnecessary objects.
On Wed, Nov 26, 2008 at 7:42 AM, iwalters <iwalters at cellc.co.za> wrote:
> I'm currently working with very large datasets that consist out of 1,000,000
> + rows. Is it at all possible to use R for datasets this size or should I
> rather consider C++/Java.
> View this message in context: http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html
> Sent from the R help mailing list archive at Nabble.com.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help