[Rd] 'methods' and environments.

John Chambers jmc at research.bell-labs.com
Mon Jun 2 14:30:33 MEST 2003


Luke Tierney wrote:
> 
> On Mon, 2 Jun 2003, John Chambers wrote:
> 
> > Laurent Gautier wrote:
> > >
> > > Hi,
> > >
> > > I have quite some trouble with the package methods.
> > > "Environments" in R are a convenient way to emulate
> > > pointers (and avoid copies of large objects, or of
> > > large collections of objects). So far, so good,
> > > but the package methods is becoming more (and more)
> > > problematic to work with. Up to version R-1.7.0,
> > > slots that were environments were still references
> > > to an environment, but I discovered in a recent
> > > R-patched that this is not the case any longer:
> > > environments as slots are now copied (increasing
> > > the memory consumption by more than three fold in my case).
> > > The (excessive) duplication (as a simple example
> > > shown below demonstrates it) is now enforced
> > > (as environments are copied too) !!!
> > >
> > > > m <- matrix(0, 600^2, 50)
> > > ## RSS of the R process is about 150MB
> > > > rm(m); gc()
> > >          used (Mb) gc trigger  (Mb)
> > > Ncells 364813  9.8     667722  17.9
> > > Vcells  85605  0.7   14858185 113.4
> > > ## RSS is now about 15 MB
> > > > library(methods)
> > > > setClass("A", representation(a="matrix"))
> > > [1] "A"
> > > > a <- new("A", a=matrix(0, 600^2, 50))
> > > ## The RSS will peak to 705 MB !!!!!!
> > >
> > > Are there any plans to make "methods" usable with
> > > large datasets ?
> >
> > The memory growth seems real, but its connection to "environments as
> > slots" is unclear.
> >
> > The only recent change that sounds relevant is the modification to
> > ensure that methods are evaluated in an environment that reflects the
> > lexical scope of the method's definition.  That does create a new
> > environment for each call to a generic function, but has nothing to do
> > with slots being environments.
> 
> That was (just) prior to 1.7.0.
> 
> >
> > It's possible there is some sort of "memory leak" or extra copying
> > there, but I'm not familiar enough with the details of that code to say
> > for sure.
> >
> > Notice that the following workaround has no bad effects on memory
> > (suggesting that the extra environment in evaluating generics may in
> > fact be relevant):
> >
> > R> setClass("A", representation(a="matrix"))
> > [1] "A"
> > R> aa <- matrix(600^2, 50)
> > R> a1 <- new("A")
> > R> a1 at a <- aa
> > R> gc()
> >          used (Mb) gc trigger (Mb)
> > Ncells 370247  9.9     531268 14.2
> > Vcells  87522  0.7     786432  6.0
> >
> 
> You have managed to store Laurant's 140MB matrix in less than 1MB!:-)
> 
> If you use matrix(0, 600^2, 50) you get essentially the same pattern
> as Laurant did.

Correct.  Oh, well.  Here's the less optimistic version:

R> setClass("A", representation(a="matrix"))
[1] "A"
R> aa = new("A")
R> aa at a <-  matrix(0, 600^2, 50)
R> gc()
           used  (Mb) gc trigger  (Mb)
Ncells   368189   9.9     667722  17.9
Vcells 36086939 275.4   54357610 414.8


A little exploration in gdb didn't show much that was surprising.  Yes,
the code copies the matrix to assign it as a slot, but nothing showed up
that was obviously much different from a similar computation that didn't
use classes & methods.

For example, a "stripped-down" analogue to assigning a slot is to assign
an attribute.  To compare with the above (both from a new R session):

R> tt = list(a=1,b=2)
R> attr(tt, "a") <- matrix(0, 600^2, 50)
R> gc()
           used  (Mb) gc trigger  (Mb)
Ncells   367539   9.9     667722  17.9
Vcells 36086917 275.4   54357588 414.8

The indication is that the two compuations are roughly identical, as one
would hope.  In either case, the behavior seems to be somewhere in
between that of the minimalist assignment of the matrix and the
computations for new("A",...).  Which is what one would expect, if there
is some additional copying going on somewhere in dispatching or
evaluating the method for initialize().

But this hardly seems to justify a diatribe, and it doesn't point to a
likely high-leverage fix.

Without some more specific guidance or ideas, a lot of time could be
spent on this without much chance of profit.

> 
> > The general solution for dealing with large objects is likely to involve
> > some extensions to R to allow "reference" objects, for which the
> > programmer is responsible for any copying.
> >
> > Environments themselves are not quite adequate for this purpose, since
> > different "references" to the same environment cannot have different
> > attributes.
> 
> Wrapping them in lists is the easiest way to deal this this.

Yes, that's what the current OOP package in Omegahat does.  But it's not
a long-term solution, because now you have a list object, which is not
what you intended.

John

> 
> luke
> 
> --
> Luke Tierney
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

-- 
John M. Chambers                  jmc at bell-labs.com
Bell Labs, Lucent Technologies    office: (908)582-2681
700 Mountain Avenue, Room 2C-282  fax:    (908)582-3340
Murray Hill, NJ  07974            web: http://www.cs.bell-labs.com/~jmc



More information about the R-devel mailing list