[Rd] Moderating consequences of garbage collection when in C

dhinds at sonic.net dhinds at sonic.net
Mon Nov 14 22:12:32 CET 2011


Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 11/14/2011 11:47 AM, dhinds at sonic.net wrote:
> > dhinds at sonic.net wrote:
> >> Martin Morgan<mtmorgan at fhcrc.org>  wrote:
> >
> > I had done some google searches on this issue, since it seemed like it
> > should not be too uncommon, but the only other hit I could come up
> > with was a thread from 2006:
> >
> > https://stat.ethz.ch/pipermail/r-devel/2006-November/043446.html
> >
> > In any case, one issue with your suggested workaround is that it
> > requires knowing how much additional storage is needed, which may be
> > an expensive operation to determine.  I've just tried implementing a
> > different approach, which is to define two new functions to either
> > disable or enable GC.  The function to disable GC first invokes
> > R_gc_full() to shrink the heap as much as possible, then sets a flag.
> > Then in R_gc_internal(), I first check that flag, and if it is set, I
> > call AdjustHeapSize(size_needed) and exit immediately.

> I think this is a better approach; mine seriously understated the 
> complexity of figuring out required size.

> > These calls could be used to bracket any code section that expects to
> > make lots of calls to R's memory allocator.  The down side is that
> > this approach requires that all paths out of such a code section
> > (including error handling) need to take care to unset the GC-disabled
> > flag.  I think I would want to hear from someone on the R team about
> > whether they think this is a good idea.
> >

> Another place where this comes up is during package load, especially for 
> packages with many S4 instances.

Do you know if this is all happening inside a C function that could
handle disabling and enabling GC?  Or would it require doing this at
the R level?  For testing, I am turning GC on and off at the R level
but I am thinking about where we would need to check for failures to
re-enable GC.  I suppose one approach would be to provide an R wrapper
that would evaluate an expression with GC disabled using tryCatch to
guarantee that it would exit with GC enabled.

>    > system.time(as.character(1:10000000))
>       user  system elapsed
>    61.908   0.297  62.303

I get 6 seconds for this with GC disabled.

> There's a hierarchy of CHARSXP / STRSXP, so maybe that could be 
> exploited in the mark phase?

I haven't explored whether GC could be made smarter so that this isn't
as big of a hit.  I don't really understand the GC process.

-- Dave



More information about the R-devel mailing list