[R] Large data sets with R (binding to hadoop available?)

Roland Rau roland.rproject at gmail.com
Thu Aug 21 21:03:49 CEST 2008


Hi

Avram Aelony wrote:
> 
> Dear R community,
> 
> I find R fantastic and use R whenever I can for my data analytic needs.  
> Certain data sets, however, are so large that other tools seem to be 
> needed to pre-process data such that it can be brought into R for 
> further analysis.
> 
> Questions I have for the many expert contributors on this list are:
> 
> 1. How do others handle situations of large data sets (gigabytes, 
> terabytes) for analysis in R ?
> 
I usually try to store the data in an SQLite database and interface via 
functions from the packages RSQLite (and DBI).

No idea about Question No. 2, though.

Hope this helps,
Roland


P.S. When I am sure that I only need a certain subset of large data 
sets, I still prefer to do some pre-processing in awk (gawk).
2.P.S. The size of my data sets are in the gigabyte range (not terabyte 
range). This might be important if your data sets are *really large* and 
you want to use sqlite: http://www.sqlite.org/whentouse.html



More information about the R-help mailing list