[R] R Memory Usage Concerns

Evan Klitzke evan at eklitzke.org
Tue Sep 15 09:10:33 CEST 2009


On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> As already suggested, you're (much) better off if you specify colClasses, e.g.
>
> tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double"));
>
> Otherwise, R has to load all the data, make a best guess of the column
> classes, and then coerce (which requires a copy).

Thanks Henrik, I tried this as well as a variant that another user
sent me privately. When I tell R the colClasses, it does a much better
job of allocating memory (ending up with 96M of RSS memory, which
isn't great but is definitely acceptable).

A couple of notes I made from testing some variants, if anyone else is
interested:
 * giving it an nrows argument doesn't help it allocate less memory
(just a guess, but maybe because it's trying the powers-of-two
allocation strategy in both cases)
 * there's no difference in memory usage between telling it a column
is "numeric" vs "double"
 * when telling it the types in advance, loading the table is much, much faster

Maybe if I gather some more fortitude in the future, I'll poke around
at the internals and see where the extra memory is going, since I'm
still curious where the extra memory is going. Is that just the
overhead of allocating a full object for each value (i.e. rather than
just a double[] or whatever)?

-- 
Evan Klitzke <evan at eklitzke.org> :wq




More information about the R-help mailing list