[R] R Memory Usage Concerns

Thomas Lumley tlumley at u.washington.edu
Tue Sep 15 16:15:12 CEST 2009


On Tue, 15 Sep 2009, Evan Klitzke wrote:

> On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
>> As already suggested, you're (much) better off if you specify colClasses, e.g.
>>
>> tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double"));
>>
>> Otherwise, R has to load all the data, make a best guess of the column
>> classes, and then coerce (which requires a copy).
>
> Thanks Henrik, I tried this as well as a variant that another user
> sent me privately. When I tell R the colClasses, it does a much better
> job of allocating memory (ending up with 96M of RSS memory, which
> isn't great but is definitely acceptable).
>
> A couple of notes I made from testing some variants, if anyone else is
> interested:
> * giving it an nrows argument doesn't help it allocate less memory
> (just a guess, but maybe because it's trying the powers-of-two
> allocation strategy in both cases)
> * there's no difference in memory usage between telling it a column
> is "numeric" vs "double"

Because they are the same type

> * when telling it the types in advance, loading the table is much, much faster

Indeed.

> Maybe if I gather some more fortitude in the future, I'll poke around
> at the internals and see where the extra memory is going, since I'm
> still curious where the extra memory is going. Is that just the
> overhead of allocating a full object for each value (i.e. rather than
> just a double[] or whatever)?

No, because it doesn't allocate a full object for each value, it does just allocate a double[] plus a 
constant amount of overhead.  R doesn't have scalar types so there isn't even such a thing as an object 
for a single value, just vectors with a single element.  R will use more than the object size for the data 
set, because it makes temporary copies of things.

         -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list