[R] How to deal with more than 6GB dataset using R?

Allan Engelhardt allane at cybaea.com
Fri Jul 23 18:39:25 CEST 2010


read.table is not very inefficient IF you specify the colClasses= 
parameter.  scan (with the what= parameter) is probably a little more 
efficient.  In either case, save the data using save() once you have it 
in the right structure and it will be much more efficient to read it 
next time.  (In fact I often exit R at this stage and re-start it with 
the .RData file before I start the analysis to clear out the memory.)

I did a lot of testing on the types of (large) data structures I 
normally work with and found that options("save.defaults" = 
list(compress="bzip2", compression_level=6, ascii=FALSE)) gave me the 
best trade-off between size and speed.  Your mileage will undoubtedly 
vary, but if you do this a lot it may be worth getting hard data for 
your setup.

Hope this helps a little.

Allan

On 23/07/10 17:10, babyfoxlove1 at sina.com wrote:
>  Hi there,
>
> Sorry to bother those who are not interested in this problem.
>
> I'm dealing with a large data set, more than 6 GB file, and doing regression test with those data. I was wondering are there any efficient ways to read those data? Instead of just using read.table()? BTW, I'm using a 64bit version desktop and a 64bit version R, and the memory for the desktop is enough for me to use.
> Thanks.
>
>
> --Gin
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list