[Rd] Re: [R] Memory Fragmentation in R
nawaaz at inktomi.com
Sat Feb 19 22:21:05 CET 2005
Thanks Brian. I looked at the code (memory.c) after I sent out the first
email and noticed the malloc() call that you mention in your reply.
Looking into this code suggested a possible scenario where R would fail
in malloc() even if it had enough free heap address space.
I noticed that if there is enough heap address space (memory.c:1796,
VHEAP_FREE() > alloc_size) then the garbage collector is not run. So
malloc could fail (since there is no more address space to use), even
though R itself has enough free space it can reclaim. A simple fix is
for R to try doing garbage collection if malloc() fails.
I hacked memory.c() to look in R_GenHeap[LARGE_NODE_CLASS].New if
malloc() fails (in a very similar fashion to ReleaseLargeFreeVectors())
I did a "best-fit" stealing from this list and returned it to
allocVector(). This seemed to fix my particular problem - the large
vectors that I had allocated in the previous round were still sitting in
this list. Of course, the right thing to do is to check if there are
any free vectors of the right size before calling malloc() - but it was
simpler to do it my way (because I did not have to worry about how
efficient my best-fit was; memory allocation was anyway going to fail).
I can look deeper into this and provide more details if needed.
Prof Brian Ripley wrote:
> BTW, I think this is really an R-devel question, and if you want to
> pursue this please use that list. (See the posting guide as to why I
> think so.)
> This looks like fragmentation of the address space: many of us are using
> 64-bit OSes with 2-4Gb of RAM precisely to avoid such fragmentation.
> Notice (memory.c line 1829 in the current sources) that large vectors
> are malloc-ed separately, so this is a malloc failure, and there is not
> a lot R can do about how malloc fragments the (presumably in your case
> as you did not say) 32-bit process address space.
> The message
> 1101.7 Mbytes of heap free (51%)
> is a legacy of an earlier gc() and is not really `free': I believe it
> means something like `may be allocated before garbage collection is
> triggered': see memory.c.
> On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:
>> I have a data set of roughly 700MB which during processing grows up to
>> 2G ( I'm using a 4G linux box). After the work is done I clean up
>> (rm()) and the state is returned to 700MB. Yet I find I cannot run the
>> same routine again as it claims to not be able to allocate memory even
>> though gcinfo() claims there is 1.1G left.
>> At the start of the second time
>> used (Mb) gc trigger (Mb)
>> Ncells 2261001 60.4 3493455 93.3
>> Vcells 98828592 754.1 279952797 2135.9
>> Before Failing
>> Garbage collection 459 = 312+51+96 (level 0) ...
>> 1222596 cons cells free (34%)
>> 1101.7 Mbytes of heap free (51%)
>> Error: cannot allocate vector of size 559481 Kb
>> This looks like a fragmentation problem. Anyone have a handle on this
>> situation? (ie. any work around?) Anyone working on improving R's
>> fragmentation problems?
>> On the other hand, is it possible there is a memory leak? In order to
>> make my functions work on this dataset I tried to eliminate copies by
>> coding with references (basic new.env() tricks). I presume that my
>> cleaning up returned the temporary data (as evidenced by the gc output
>> at the start of the second round of processing). Is it possible that
>> it was not really cleaned up and is sitting around somewhere even
>> though gc() thinks it has been returned?
>> Thanks - any clues to follow up will be very helpful.
More information about the R-devel