[Rd] Re: [R] Memory Fragmentation in R

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Feb 20 00:00:44 CET 2005


I am not the expert here (the author, Luke Tierney, is probably 
listening), but I understood you to have done a gc() immediately before 
your second run: you presented statistics from it.  If so, then I don't 
understand in detail.  Probably Luke does.

That's good general advice: clear out results you no longer need and run 
gc() before starting a memory-intensive task (and it also helps if you are 
timing things not to include the time of gc()-ing previous work).
I did sometimes run gc() at the end of each simulation run just to 
ensure that malloc has the maximal chance to clean up the allocations, in 
32-bit days.

On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:

> Thanks Brian. I looked at the code (memory.c) after I sent out the first 
> email and noticed the malloc() call that you mention in your reply.
> Looking into this code suggested a possible scenario where R would fail in 
> malloc() even if it had enough free heap address space.
>
> I noticed that if there is enough heap address space (memory.c:1796, 
> VHEAP_FREE() > alloc_size)

I don't think that quite corresponds to your words: it is rather that 
successful allocation would not provoke a gc (unless gc.torture is on).

> then the garbage collector is not run. So malloc 
> could fail (since there is no more address space to use), even though R 
> itself has enough free space it can reclaim. A simple fix is for R to try 
> doing garbage collection if malloc() fails.

I believe running ReleaseLargeFreeVectors would suffice.

> I hacked memory.c() to look in R_GenHeap[LARGE_NODE_CLASS].New if malloc() 
> fails (in a very similar fashion to ReleaseLargeFreeVectors())
> I did a "best-fit" stealing from this list and returned it to allocVector(). 
> This seemed to fix my particular problem - the large vectors that I had 
> allocated in the previous round were still sitting in  this list.

They should have been released by the gc() you presented the statistics 
from, and they would have been included in those statistics if still in 
use at that point. So, I don't understand why they are still around.

> Of course, the right thing to do is to check if there are any free 
> vectors of the right size before calling malloc() - but it was simpler 
> to do it my way (because I did not have to worry about how efficient my 
> best-fit was; memory allocation was anyway going to fail).

I rather doubt that is better than letting the malloc sort this out, as it 
might be able to consolidate blocks if given them all back at once.

> I can look deeper into this and provide more details if needed.

I am unclear what you actually did, but it may be a judicious gc() is all 
that was needed: otherwise the issues should be the same in the first and 
the subsequent run.  That's not to say that when the trigger gets near the 
total address space we could not do better: and perhaps we should not let 
it to do so (if we could actually determine the size of the address space 
... it is 2Gb or 3Gb on Windows for example).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list