[Rd] 'methods' and environments.
luke at stat.uiowa.edu
Mon Jun 2 11:10:14 MEST 2003
On Mon, 2 Jun 2003, John Chambers wrote:
> Laurent Gautier wrote:
> > Hi,
> > I have quite some trouble with the package methods.
> > "Environments" in R are a convenient way to emulate
> > pointers (and avoid copies of large objects, or of
> > large collections of objects). So far, so good,
> > but the package methods is becoming more (and more)
> > problematic to work with. Up to version R-1.7.0,
> > slots that were environments were still references
> > to an environment, but I discovered in a recent
> > R-patched that this is not the case any longer:
> > environments as slots are now copied (increasing
> > the memory consumption by more than three fold in my case).
> > The (excessive) duplication (as a simple example
> > shown below demonstrates it) is now enforced
> > (as environments are copied too) !!!
> > > m <- matrix(0, 600^2, 50)
> > ## RSS of the R process is about 150MB
> > > rm(m); gc()
> > used (Mb) gc trigger (Mb)
> > Ncells 364813 9.8 667722 17.9
> > Vcells 85605 0.7 14858185 113.4
> > ## RSS is now about 15 MB
> > > library(methods)
> > > setClass("A", representation(a="matrix"))
> >  "A"
> > > a <- new("A", a=matrix(0, 600^2, 50))
> > ## The RSS will peak to 705 MB !!!!!!
> > Are there any plans to make "methods" usable with
> > large datasets ?
> The memory growth seems real, but its connection to "environments as
> slots" is unclear.
> The only recent change that sounds relevant is the modification to
> ensure that methods are evaluated in an environment that reflects the
> lexical scope of the method's definition. That does create a new
> environment for each call to a generic function, but has nothing to do
> with slots being environments.
That was (just) prior to 1.7.0.
> It's possible there is some sort of "memory leak" or extra copying
> there, but I'm not familiar enough with the details of that code to say
> for sure.
> Notice that the following workaround has no bad effects on memory
> (suggesting that the extra environment in evaluating generics may in
> fact be relevant):
> R> setClass("A", representation(a="matrix"))
>  "A"
> R> aa <- matrix(600^2, 50)
> R> a1 <- new("A")
> R> a1 at a <- aa
> R> gc()
> used (Mb) gc trigger (Mb)
> Ncells 370247 9.9 531268 14.2
> Vcells 87522 0.7 786432 6.0
You have managed to store Laurant's 140MB matrix in less than 1MB!:-)
If you use matrix(0, 600^2, 50) you get essentially the same pattern
as Laurant did.
> The general solution for dealing with large objects is likely to involve
> some extensions to R to allow "reference" objects, for which the
> programmer is responsible for any copying.
> Environments themselves are not quite adequate for this purpose, since
> different "references" to the same environment cannot have different
Wrapping them in lists is the easiest way to deal this this.
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel