[R] Garbage collection problem

Duncan Murdoch murdoch.duncan at gmail.com
Fri Jan 4 01:41:40 CET 2013


On 13-01-03 7:01 PM, Peter Langfelder wrote:
> Hello all,
>
> I am running into a problem with garbage collection not being able to
> free up all memory. Unfortunately I am unable to provide a minimal
> self-contained example, although I can provide a self contained
> example if anyone feels like wading through some 600 lines of code. I
> would love to isolate the relevant parts from the code but whenever I
> try to run a simpler example, the problem does not appear.
>
> I run an algorithm that repeats the same calculation (on sampled, i.e.
> different data) in each iteration. Each iteration uses relatively
> large intermediate objects and calculations but returns a smaller
> result; these results are then collated and returned from the main
> function (call it myFnc). The problem is that memory used by the
> intermediate calculations (it is difficult to say whether it's objects
> or memory needed for apply calls) does not seem to be freed up even
> after doing explicit garbage collection using gc() within the loop.
>
> Thus, a call of something like
>
> result = myFnc(arguments)
>
> results is some memory that does not seem allocated to any visible
> objects and yet is not freed up using gc(): After executing an actual
> call to the offending function, gc() tells me that Vcells use 538.6
> Mb, but the sum of object.size() of all objects listed by ls(all.names
> = TRUE) is only 183.3 Mb.
>
>
> The thing is that if I remove 'result' using rm(result) and do gc()
> again, the memory used decreases by a lot.: gc() now reports 110.3 Mb
> used in Vcells; this roughly corresponds to the sum of the sizes of
> all objects returned by ls() (after removing 'result'), which is now
> 108.7 Mb. So used memory went down by something like 428 Mb but the
> object.size of 'result' is only about 75 Mb.
>
> Thus, it seems that the memory used by internal operations in myFun
> that should be freed up upon the completion of the function call
> cannot be released by garbage collection until the result of the
> function call is also removed.
>
> Like I said, I tried to replicate this behaviour on simple examples
> but could not.
>
> My question is, is this behaviour to be expected in complicated code,
> or is it a bug that should be reported? Is there any way around it?
>
> Thanks in advance for any insights or pointers.

I doubt if it is a bug.  Remember the warning from ?object.size:

"Exactly which parts of the memory allocation should be attributed to 
which object is not clear-cut. This function merely provides a rough 
indication: it should be reasonably accurate for atomic vectors, but 
does not detect if elements of a list are shared, for example. (Sharing 
amongst elements of a character vector is taken into account, but not 
that between character vectors in a single object.)

The calculation is of the size of the object, and excludes the space 
needed to store its name in the symbol table.

Associated space (e.g. the environment of a function and what the 
pointer in a EXTPTRSXP points to) is not included in the calculation."

For a simple example:

 > x <- 1:1000000
 > object.size(x)
4000024 bytes
 > e <- new.env()
 > object.size(e)
28 bytes
 > e$x <- x
 > object.size(e)
28 bytes

At the end, e is an environment holding an object of 4 million bytes, 
but its size is 28 bytes.  You'll get environments whenever you return 
functions from other functions (e.g. what approxfun() does), or when you 
create formulas, e.g.

 > f <- function() { x <- 1:1000000
+  y <- rnorm(1000000)
+  y ~ x
+ }

 > fla <- f()
 > object.size(fla)
372 bytes

Now fla is the formula, but the data vectors x and y are part of its 
environment, so you can use it in fits:

 > summary(lm(fla))

Call:
lm(formula = fla)

Residuals:
     Min      1Q  Median      3Q     Max
-4.8357 -0.6748  0.0002  0.6736  4.4961

Coefficients:
               Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.632e-03  1.998e-03  -1.317    0.188
x            3.302e-09  3.461e-09   0.954    0.340

Residual standard error: 0.9992 on 999998 degrees of freedom
Multiple R-squared: 9.098e-07,	Adjusted R-squared: -9.016e-08
F-statistic: 0.9098 on 1 and 999998 DF,  p-value: 0.3402


Duncan Murdoch




More information about the R-help mailing list