[R] Enormous Datasets

Liaw, Andy andy_liaw at merck.com
Thu Nov 18 22:33:14 CET 2004


It depends on what you want to do with that data in R.  If you want to play
with the whole data, just storing it in R will require more than 2.6GB of
memory (assuming all data are numeric and are stored as doubles):

> 7e6 * 50 * 8 / 1024^2
[1] 2670.288

That's not impossible, but you'll need to be on a computer with quite a bit
more memory than that, and running on an OS that supports it.  If that's not
feasible, you need to re-think what you want to do with that data in R
(e.g., read in and process a small chunk at a time, or read in a random
sample, etc.).

Andy


> From: Thomas W Volscho
> 
> Dear List,
> I have some projects where I use enormous datasets.  For 
> instance, the 5% PUMS microdata from the Census Bureau.  
> After deleting cases I may have a dataset with 7 million+ 
> rows and 50+ columns.  Will R handle a datafile of this size? 
>  If so, how?
> 
> Thank you in advance,
> Tom Volscho
> 
> ************************************        
> Thomas W. Volscho
> Graduate Student
> Dept. of Sociology U-2068
> University of Connecticut
> Storrs, CT 06269
> Phone: (860) 486-3882
> http://vm.uconn.edu/~twv00001
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list