[Rd] 'methods' and environments.

Henrik Bengtsson hb at maths.lth.se
Mon Jun 2 18:36:00 MEST 2003


> -----Original Message-----
> From: r-devel-bounces at stat.math.ethz.ch 
> [mailto:r-devel-bounces at stat.math.ethz.ch] On Behalf Of Luke Tierney
> Sent: den 2 juni 2003 17:10
> To: John Chambers
> Cc: r-devel at stat.math.ethz.ch; Laurent Gautier
> Subject: Re: [Rd] 'methods' and environments.
> 
> 
> On Mon, 2 Jun 2003, John Chambers wrote:
> 
> > Laurent Gautier wrote:
> > > 
> > > Hi,
> > > 
> > > I have quite some trouble with the package methods. 
> "Environments" 
> > > in R are a convenient way to emulate pointers (and avoid 
> copies of 
> > > large objects, or of large collections of objects). So 
> far, so good,
> > > but the package methods is becoming more (and more)
> > > problematic to work with. Up to version R-1.7.0,
> > > slots that were environments were still references
> > > to an environment, but I discovered in a recent
> > > R-patched that this is not the case any longer:
> > > environments as slots are now copied (increasing
> > > the memory consumption by more than three fold in my case).
> > > The (excessive) duplication (as a simple example
> > > shown below demonstrates it) is now enforced
> > > (as environments are copied too) !!!
> > > 
> > > > m <- matrix(0, 600^2, 50)
> > > ## RSS of the R process is about 150MB
> > > > rm(m); gc()
> > >          used (Mb) gc trigger  (Mb)
> > > Ncells 364813  9.8     667722  17.9
> > > Vcells  85605  0.7   14858185 113.4
> > > ## RSS is now about 15 MB
> > > > library(methods)
> > > > setClass("A", representation(a="matrix"))
> > > [1] "A"
> > > > a <- new("A", a=matrix(0, 600^2, 50))
> > > ## The RSS will peak to 705 MB !!!!!!
> > > 
> > > Are there any plans to make "methods" usable with
> > > large datasets ?
> > 
> > The memory growth seems real, but its connection to 
> "environments as 
> > slots" is unclear.
> > 
> > The only recent change that sounds relevant is the modification to 
> > ensure that methods are evaluated in an environment that 
> reflects the 
> > lexical scope of the method's definition.  That does create a new 
> > environment for each call to a generic function, but has 
> nothing to do 
> > with slots being environments.
> 
> That was (just) prior to 1.7.0.
> 
> > 
> > It's possible there is some sort of "memory leak" or extra copying 
> > there, but I'm not familiar enough with the details of that code to 
> > say for sure.
> > 
> > Notice that the following workaround has no bad effects on memory 
> > (suggesting that the extra environment in evaluating 
> generics may in 
> > fact be relevant):
> > 
> > R> setClass("A", representation(a="matrix"))
> > [1] "A"
> > R> aa <- matrix(600^2, 50)
> > R> a1 <- new("A")
> > R> a1 at a <- aa
> > R> gc()
> >          used (Mb) gc trigger (Mb)
> > Ncells 370247  9.9     531268 14.2
> > Vcells  87522  0.7     786432  6.0
> > 
> 
> You have managed to store Laurant's 140MB matrix in less than 1MB!:-)
> 
> If you use matrix(0, 600^2, 50) you get essentially the same 
> pattern as Laurant did.
> 
> > The general solution for dealing with large objects is likely to 
> > involve some extensions to R to allow "reference" objects, 
> for which 
> > the programmer is responsible for any copying.
> > 
> > Environments themselves are not quite adequate for this 
> purpose, since 
> > different "references" to the same environment cannot have 
> different 
> > attributes.
> 
> Wrapping them in lists is the easiest way to deal this this.

Yes, wrap them up in a list is good. You can not use environments
directly for different reasons. Try to do it, then quit R and save the
workspace and then restart R to reload the workspace and you will see
the problem (at least this was the case for R v1.6.2).

Another comment: A while ago I compared storing environments in lists,
i.e. ref$.env or ref[[".env"]], with storing them as attributes, i.e.
attr(ref, ".env"),  and found that it is faster to retrieve an
environment variable if it is stored as an attribute. This might the
useful to know if your going access your "referenced" data many times.

Best wishes

Henrik Bengtsson

Dept. of Mathematical Statistics @ Centre for Mathematical Sciences
Lund Institute of Technology/Lund University, Sweden 
(Sweden +2h UTC, Melbourne +10 UTC, Calif. -7h UTC)
+46 708 909208 (cell), +46 46 320 820 (home), 
+1 (508) 464 6644 (global fax),
+46 46 2229611 (off), +46 46 2224623 (dept. fax)
h b @ m a t h s . l t h . s e, http://www.maths.lth.se/~hb/



More information about the R-devel mailing list