[R] How to deal with more than 6GB dataset using R?

Allan Engelhardt allane at cybaea.com
Fri Jul 23 18:45:08 CEST 2010


On 23/07/10 17:36, Duncan Murdoch wrote:
> On 23/07/2010 12:10 PM, babyfoxlove1 at sina.com wrote:
>> [...]
>
> You probably won't get much faster than read.table with all of the 
> colClasses specified.  It will be a lot slower if you leave that at 
> the default NA setting, because then R needs to figure out the types 
> by reading them as character and examining all the values.  If the 
> file is very consistently structured (e.g. the same number of 
> characters in every value in every row) you might be able to write a C 
> function to read it faster, but I'd guess the time spent writing that 
> would be a lot more than the time saved.

And try the utils::read.fwf() function before you roll your own C code 
for this use case.

If you do write C code, consider writing a converter to .RData format 
which R seems to read quite efficiently.

Hope this helps.

Allan



More information about the R-help mailing list