[R] Need advice on using R with large datasets

Liaw, Andy andy_liaw at merck.com
Tue Apr 13 16:37:54 CEST 2004


I was under the impression that R has been run on 64-bit Solaris (and other
64-bit Unices) for quite a while (as 64-bit app).  We've been running 64-bit
R on amd64 for a few months (and had quite a few oppertunities to get the R
processes using over 8GB of RAM).  Not much problem as far as I can see...

Best,
Andy

> From: Roger D. Peng
> 
> As far as I know, R does compile on AMD Opterons and runs as a 
> 64-bit application.  So it can store objects larger than 4GB. 
> However, I don't think R gets tested very often on 64-bit 
> machines with such large objects so there may be yet undiscovered 
> bugs.
> 
> -roger
> 
> Sunny Ho wrote:
> 
> > Hello everyone,
> > 
> > I would like to get some advices on using R with some 
> really large datasets.
> > 
> > I'm using RH9 Linux R 1.8.1 for a research with a lot of 
> numerical data. The datasets total to around 200Mb (shown by 
> memory.size). During my data manipulation, the system memory 
> usage grew to 1.5Gb, and this caused a lot of swapping 
> activities on my 1Gb PC. This is just a small-scale 
> experiment, the full-scale one will be using data 30 times as 
> large (on a 4Gb machine). I can see that I'll need to deal 
> with memory usage problem very soon.
> > 
> > I notice that R keeps all datasets in memory at all times. 
> I wonder whether there is any way to instruct R to push some 
> of the less-frequently-used data tables out of main memory, 
> so as to free up memory for those that are actively in used. 
> It'll be even better if R can keep only part of a table in 
> memory only when that part is needed. Using save & load could 
> help, but I just wonder whether R is intelligent enough to do 
> this by itself, so I don't need to keep track of memory usage 
> at all times.
> > 
> > Another thought is to use a 64-bit machine (AMD64). I find 
> there is a pre-compiled R for Fedora Linux on AMD64. Anyone 
> knows whether this version of R runs as 64-bit? If so, then 
> will R be able to go beyond the 32-bit 4Gb memory limit?
> > 
> > Also, from the manual, I find that the RPgSQL package (for 
> PostgreSQL database) supports a feature "proxy data frame". 
> Does anyone have experience with this? Can "proxy data frame" 
> handle memory efficiently for very large datasets? Say, if I 
> have a 6Gb database table defined as a proxy data frame, will 
> R & RPgSQL be able to handle it with just 4Gb of memory?
> > 
> > Any comments will be useful. Many thanks.
> > 
> > Sunny Ho
> > (Hong Kong University of Science & Technology)
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> >
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}




More information about the R-help mailing list