[R] How to deal with more than 6GB dataset using R?

Duncan Murdoch murdoch.duncan at gmail.com
Fri Jul 23 18:36:05 CEST 2010


On 23/07/2010 12:10 PM, babyfoxlove1 at sina.com wrote:
>  Hi there,
>
> Sorry to bother those who are not interested in this problem.
>
> I'm dealing with a large data set, more than 6 GB file, and doing regression test with those data. I was wondering are there any efficient ways to read those data? Instead of just using read.table()? BTW, I'm using a 64bit version desktop and a 64bit version R, and the memory for the desktop is enough for me to use.
> Thanks.
>   

You probably won't get much faster than read.table with all of the 
colClasses specified.  It will be a lot slower if you leave that at the 
default NA setting, because then R needs to figure out the types by 
reading them as character and examining all the values.  If the file is 
very consistently structured (e.g. the same number of characters in 
every value in every row) you might be able to write a C function to 
read it faster, but I'd guess the time spent writing that would be a lot 
more than the time saved.

Duncan Murdoch



More information about the R-help mailing list