[R] Need advice on using R with large datasets

Paul Gilbert pgilbert at bank-banque-canada.ca
Tue Apr 13 20:00:02 CEST 2004


Liaw, Andy wrote:

>I was under the impression that R has been run on 64-bit Solaris (and other
>64-bit Unices) for quite a while (as 64-bit app).  
>
Yes, on Solaris it has worked for quite a while. I don't use it a lot, 
but have one problem that I have been running from time to time for a 
few years.  There are two "issues" that I know about.

1/  Some extra capabilities (like png I think)  also need to be compiled 
as 64 bit apps, and in some cases this is a non-trivial effort (on 
Solaris for someone like me that does not do that kind of thing often). 
For this reason I have both a 32-bit version for regular use and a 
64-bit version for special problems.

2/  Some R functions make copies of the data sets used and attach them 
to the result. For small data sets that can be very useful. If the 
result is then used as an argument to another function then very quickly 
there are multiple copies. If the data set is large then one is quickly 
making heavy use of swap, and the processing is very slow. This is not 
just a 64-bit problem, but with a 32-bit architecture it is hard to work 
on a data set big enough that this becomes an issue.  In some cases 
performance  can be improved  a lot by hacking the code and  not 
attaching the  dataset to the result (with some risk that functions 
using the result get broken).

Paul Gilbert

>We've been running 64-bit
>R on amd64 for a few months (and had quite a few oppertunities to get the R
>processes using over 8GB of RAM).  Not much problem as far as I can see...
>
>Best,
>Andy
>
>  
>
>>From: Roger D. Peng
>>
>>As far as I know, R does compile on AMD Opterons and runs as a 
>>64-bit application.  So it can store objects larger than 4GB. 
>>However, I don't think R gets tested very often on 64-bit 
>>machines with such large objects so there may be yet undiscovered 
>>bugs.
>>
>>-roger
>>
>>Sunny Ho wrote:
>>
>>    
>>
>>>Hello everyone,
>>>
>>>I would like to get some advices on using R with some 
>>>      
>>>
>>really large datasets.
>>    
>>
>>>I'm using RH9 Linux R 1.8.1 for a research with a lot of 
>>>      
>>>
>>numerical data. The datasets total to around 200Mb (shown by 
>>memory.size). During my data manipulation, the system memory 
>>usage grew to 1.5Gb, and this caused a lot of swapping 
>>activities on my 1Gb PC. This is just a small-scale 
>>experiment, the full-scale one will be using data 30 times as 
>>large (on a 4Gb machine). I can see that I'll need to deal 
>>with memory usage problem very soon.
>>    
>>
>>>I notice that R keeps all datasets in memory at all times. 
>>>      
>>>
>>I wonder whether there is any way to instruct R to push some 
>>of the less-frequently-used data tables out of main memory, 
>>so as to free up memory for those that are actively in used. 
>>It'll be even better if R can keep only part of a table in 
>>memory only when that part is needed. Using save & load could 
>>help, but I just wonder whether R is intelligent enough to do 
>>this by itself, so I don't need to keep track of memory usage 
>>at all times.
>>    
>>
>>>Another thought is to use a 64-bit machine (AMD64). I find 
>>>      
>>>
>>there is a pre-compiled R for Fedora Linux on AMD64. Anyone 
>>knows whether this version of R runs as 64-bit? If so, then 
>>will R be able to go beyond the 32-bit 4Gb memory limit?
>>    
>>
>>>Also, from the manual, I find that the RPgSQL package (for 
>>>      
>>>
>>PostgreSQL database) supports a feature "proxy data frame". 
>>Does anyone have experience with this? Can "proxy data frame" 
>>handle memory efficiently for very large datasets? Say, if I 
>>have a 6Gb database table defined as a proxy data frame, will 
>>R & RPgSQL be able to handle it with just 4Gb of memory?
>>    
>>
>>>Any comments will be useful. Many thanks.
>>>
>>>Sunny Ho
>>>(Hong Kong University of Science & Technology)
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! 
>>>      
>>>
>>http://www.R-project.org/posting-guide.html
>>    
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>
>>
>>    
>>
>
>
>------------------------------------------------------------------------------
>Notice:  This e-mail message, together with any attachments,...{{dropped}}
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>  
>




More information about the R-help mailing list