[Rd] 'methods' and environments.

Luke Tierney luke at stat.uiowa.edu
Mon Jun 2 11:10:14 MEST 2003


On Mon, 2 Jun 2003, John Chambers wrote:

> Laurent Gautier wrote:
> > 
> > Hi,
> > 
> > I have quite some trouble with the package methods.
> > "Environments" in R are a convenient way to emulate
> > pointers (and avoid copies of large objects, or of
> > large collections of objects). So far, so good,
> > but the package methods is becoming more (and more)
> > problematic to work with. Up to version R-1.7.0,
> > slots that were environments were still references
> > to an environment, but I discovered in a recent
> > R-patched that this is not the case any longer:
> > environments as slots are now copied (increasing
> > the memory consumption by more than three fold in my case).
> > The (excessive) duplication (as a simple example
> > shown below demonstrates it) is now enforced
> > (as environments are copied too) !!!
> > 
> > > m <- matrix(0, 600^2, 50)
> > ## RSS of the R process is about 150MB
> > > rm(m); gc()
> >          used (Mb) gc trigger  (Mb)
> > Ncells 364813  9.8     667722  17.9
> > Vcells  85605  0.7   14858185 113.4
> > ## RSS is now about 15 MB
> > > library(methods)
> > > setClass("A", representation(a="matrix"))
> > [1] "A"
> > > a <- new("A", a=matrix(0, 600^2, 50))
> > ## The RSS will peak to 705 MB !!!!!!
> > 
> > Are there any plans to make "methods" usable with
> > large datasets ?
> 
> The memory growth seems real, but its connection to "environments as
> slots" is unclear.
> 
> The only recent change that sounds relevant is the modification to
> ensure that methods are evaluated in an environment that reflects the
> lexical scope of the method's definition.  That does create a new
> environment for each call to a generic function, but has nothing to do
> with slots being environments.

That was (just) prior to 1.7.0.

> 
> It's possible there is some sort of "memory leak" or extra copying
> there, but I'm not familiar enough with the details of that code to say
> for sure.
> 
> Notice that the following workaround has no bad effects on memory
> (suggesting that the extra environment in evaluating generics may in
> fact be relevant):
> 
> R> setClass("A", representation(a="matrix"))
> [1] "A"
> R> aa <- matrix(600^2, 50)
> R> a1 <- new("A")
> R> a1 at a <- aa
> R> gc()
>          used (Mb) gc trigger (Mb)
> Ncells 370247  9.9     531268 14.2
> Vcells  87522  0.7     786432  6.0
> 

You have managed to store Laurant's 140MB matrix in less than 1MB!:-)

If you use matrix(0, 600^2, 50) you get essentially the same pattern
as Laurant did.

> The general solution for dealing with large objects is likely to involve
> some extensions to R to allow "reference" objects, for which the
> programmer is responsible for any copying.
> 
> Environments themselves are not quite adequate for this purpose, since
> different "references" to the same environment cannot have different
> attributes.

Wrapping them in lists is the easiest way to deal this this.

luke

-- 
Luke Tierney
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list