[Rd] garbage collection & memory leaks in 'R', it seems...

Peter Dalgaard pdalgd at gmail.com
Sat Jul 17 11:11:40 CEST 2010


Mike Williamson wrote:
> Hello developers,
> 
>     I noticed that if I am running 'R', type "rm(list=objects())" and
> "gc()", 'R' will still be consuming (a lot) more memory than when I then
> close 'R' and re-open it.  In my ignorance, I'm presuming this is something
> in 'R' where it doesn't really do a great job of garbage collection... at
> least not nearly as well as Windows or unix can do garbage collection.
>     Am I right?  If so, is there any better way to "clean up" the memory
> that 'R' is using?  I have a script that runs a fairly large job, and I
> cannot keep it going on its own in a convenient way because of these
> remnants of garbage that pile up and eventually leave so little memory
> remaining that the script crashes.

In a word, no, R is not particularly bad at GC. The internal gc() does a
rather good job of finding unused objects as you can see from its
returned report. Whether that memory is returned to the OS is a matter
of the C-level services (malloc/free) that R's allocation routines use.

As far as I recall, Windows free() just never returns memory to the OS.
In general, whether it can be done at all depends on which part of the
"heap" you have freed since you have to free from the end of it. (I.e.,
 having a tiny object sitting at the end of the heap will force the
entire range to be kept in memory.)

R itself will allocate from freed-up areas of the heap as long as it can
find a space that is big enough. However, there is always a tendency for
memory to fragmentize so that you eventually have a pattern of many
small objects with not-quite-big-enough holes between them.

These issues affect most languages that do significant amounts of object
allocation and destruction. You should not really compare it to OS level
memory management because that's a different kettle of fish. In
particular, user programs like R relies on having all objects mapped to
a single linear address space, whereas the OS "just" needs to create a
set of per-process virtual address spaces and has hardware help to do so.


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list