[R] memory management

William Dunlap wdunlap at tibco.com
Tue Feb 28 21:19:06 CET 2012


Look into environments that may be stored
with your data.  object.size(obj) does not
report on the size of the environment(s)
associated with obj.  E.g.,

  > f <- function(n) {
  +    d <- data.frame(y=rnorm(n), x1=rnorm(n), x2=rnorm(n))
  +    terms(data=d, y~.)
  + }
  > z <- f(1e6)
  > object.size(z)
  1760 bytes
  > eapply(environment(z), object.size)
  $d
  24000520 bytes

  $n
  32 bytes
That happens because formula objects (like function
objects) contain a reference to the environment in
which they were created and that environmentwill not
be destroyed until the last reference to it is gone.
You might be able write code using, e.g., the codetools
package to walk through your objects looking for all
distinct environments that they reference (directly
and indirectly, via ancestors of environments directly
referenced).  Then you can add up the sizes of things
in those environments.

Another possible reason for your problem is that by using ls()
instead of ls(all=TRUE) you are not looking at datasets
whose names start with a dot.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Sam Steingold
> Sent: Tuesday, February 28, 2012 11:58 AM
> To: r-help at r-project.org; Bert Gunter
> Subject: Re: [R] memory management
> 
> My basic worry is that the GC does not work properly,
> i.e., the unreachable data is never collected.
> 
> > * Bert Gunter <thagre.oregba at trar.pbz> [2012-02-27 14:35:14 -0800]:
> >
> > This appears to be the sort of query that (with apologies to other R
> > gurus) only Brian Ripley or Luke Tierney could figure out. R generally
> > passes by value into function calls (but not *always*), so often
> > multiple copies of objects are made during the course of calls. I
> > would speculate that this is what might be going on below -- maybe
> > even that's what you meant.
> >
> > Just a guess on my part, of course, so treat accordingly.
> >
> > -- Bert
> >
> > On Mon, Feb 27, 2012 at 1:03 PM, Sam Steingold <sds at gnu.org> wrote:
> >> It appears that the intermediate data in functions is never GCed even
> >> after the return from the function call.
> >> R's RSS is 4 Gb (after a gc()) and
> >>
> >> sum(unlist(lapply(lapply(ls(),get),object.size)))
> >> [1] 1009496520
> >>
> >> (less than 1 GB)
> >>
> >> how do I figure out where the 3GB of uncollected garbage is hiding?
> 
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
> http://www.childpsy.net/ http://jihadwatch.org http://memri.org
> http://palestinefacts.org http://truepeace.org http://iris.org.il
> I may be getting older, but I refuse to grow up!
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list