[Rd] Return function from function with minimal environment

Gabor Grothendieck ggrothendieck at gmail.com
Tue Apr 4 17:53:09 CEST 2006


On 4/4/06, Henrik Bengtsson <hb at maths.lth.se> wrote:
> On 4/4/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > On 4/4/06, Henrik Bengtsson <hb at maths.lth.se> wrote:
> > > On 4/4/06, Thomas Lumley <tlumley at u.washington.edu> wrote:
> > > > On Tue, 4 Apr 2006, Henrik Bengtsson wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > this relates to the question "How to set a former environment?" asked
> > > > > yesterday.  What is the best way to to return a function with a
> > > > > minimal environment from a function? Here is a dummy example:
> > > > >
> > > > > foo <- function(huge) {
> > > > >  scale <- mean(huge)
> > > > >  function(x) { scale * x }
> > > > > }
> > > > >
> > > > > fcn <- foo(1:10e5)
> > > > >
> > > > > The problem with this approach is that the environment of 'fcn' does
> > > > > not only hold 'scale' but also the memory consuming object 'huge',
> > > > > i.e.
> > > > >
> > > > > env <- environment(fcn)
> > > > > ll(envir=env)  # ll() from R.oo
> > > > > #   member data.class dimension object.size
> > > > > # 1   huge    numeric   1000000     4000028
> > > > > # 2  scale    numeric         1          36
> > > > >
> > > > > save(env, file="temp.RData")
> > > > > file.info("temp.RData")$size
> > > > > # [1] 2007624
> > > > >
> > > > > I generate quite a few of these and my 'huge' objects are of order
> > > > > 100Mb, and I want to keep memory usage as well as file sizes to a
> > > > > minimum.  What I do now, is to remove variable from the local
> > > > > environment of 'foo' before returning, i.e.
> > > > >
> > > > > foo2 <- function(huge) {
> > > > >  scale <- mean(huge)
> > > > >  rm(huge)
> > > > >  function(x) { scale * x }
> > > > > }
> > > > >
> > > > > fcn <- foo2(1:10e5)
> > > > > env <- environment(fcn)
> > > > > ll(envir=env)
> > > > > #   member data.class dimension object.size
> > > > > # 1  scale    numeric         1          36
> > > > >
> > > > > save(env, file="temp.RData")
> > > > > file.info("temp.RData")$size
> > > > > # [1] 156
> > > > >
> > > > > Since my "foo" functions are complicated and contains many local
> > > > > variables, it becomes tedious to identify and remove all of them, so
> > > > > instead I try:
> > > > >
> > > > > foo3 <- function(huge) {
> > > > >  scale <- mean(huge);
> > > > >  env <- new.env();
> > > > >  assign("scale", scale, envir=env);
> > > > >  bar <- function(x) { scale * x };
> > > > >  environment(bar) <- env;
> > > > >  bar;
> > > > > }
> > > > >
> > > > > fcn <- foo3(1:10e5)
> > > > >
> > > > > But,
> > > > >
> > > > > env <- environment(fcn)
> > > > > save(env, file="temp.RData");
> > > > > file.info("temp.RData")$size
> > > > > # [1] 2007720
> > > > >
> > > > > When I try to set the parent environment of 'env' to emptyenv(), it
> > > > > does not work, e.g.
> > > > >
> > > > > fcn(2)
> > > > > # Error in fcn(2) : attempt to apply non-function
> > > > >
> > > > > but with the new.env(parent=baseenv()) it works fine. The "base"
> > > > > environment has the empty environment as a parent.  So, I try to do
> > > > > the same myself, i.e. new.env(parent=new.env(parent=emptyenv())), but
> > > > > once again I get
> > > >
> > > > I don't think you want to remove baseenv() from the environment. If you
> > > > do, no functions from baseenv will be visible inside fcn. These include
> > > > "{" and "*", which are necessary for your function. I think the error
> > > > message comes from being unable to find "{".
> > >
> > > Thank you, this makes sense. Modifying Roger Peng's example
> > > illustrates what you say:
> > >
> > > foo <- function(huge) {
> > >        scale <- mean(huge)
> > >        g <- function(x) x
> > >        environment(g) <- emptyenv()
> > >        g
> > > }
> > >
> > > fcn <- foo(1:10e5)
> > > fcn(2)
> > > # [1] 2
> > >
> > > But as soon as you add "something" to the g(), it is missing;
> > >
> > > foo <- function(huge) {
> > >        scale <- mean(huge)
> > >        g <- function(x) { x }
> > >        environment(g) <- emptyenv()
> > >        g
> > > }
> > >
> > > fcn <- foo(1:10e5)
> > > fcn(2)
> > > # Error in fcn(2) : attempt to apply non-function
> > >
> > > ...and I did not know that "{" and "(" are primitive functions.  Interesting.
> > >
> > > I conclude that 'env <- new.env(parent=baseenv())' is better than
> > > ''env <- new.env()' in my case.
> >
> > Is there any reason to use
> >
> >    env <- new.env(parent=baseenv())
> >
> > instead of just
> >
> >    env <- baseenv() ?
> >
> > The extra environment being created seems to serve no purpose.
>
> I need to do this, because I do not want to assign 'scale' to the base
> environment:
>
> foo <- function(huge) {
>  scale <- mean(huge)
>  env <- new.env(parent=baseenv())
>  # cf. env <- baseenv()
>  assign("scale", scale, envir=env)
>  bar <- function(x) { scale * x }
>  environment(bar) <- env
>  bar
> }
>

OK. I think the example changed throughout the discussion and
scale was not part of the latter examples.

At any rate the version with scale could be reduced to one line using evalq:

foo <- function(huge)
   evalq(function(x) { scale * x }, list(scale = mean(huge)), baseenv())



More information about the R-devel mailing list