[Rd] how to control the environment of a formula

Fri Apr 19 16:05:08 CEST 2013

On 13-04-19 8:41 AM, Therneau, Terry M., Ph.D. wrote:
>   I went through the same problem and discovery process 2 years ago with the survival package.  With pspline()  terms the return object from coxph includes a simple 6 line function for enhanced printout, which by default carried along another 30 irrelevant things some of which were huge.
> I personally think that setting environment(f) <- .Globalenv is the clearest and most simple solution.
> Note that R does not save the environment of functions defined at the top level; the prior line says to treat your function as "one of those".  It works very well as long as your function is an actual function,  i.e. It depends only on its input arguments.
>
> \begin {opinion}
>    S started out as a pure functional language.  That is, a function depends ONLY on its arguments.   Many of the strengths of S/R flow directly from the simplicity and rigor that this gives.
> There is an adage in programming, going back to at least the earliest Fortran compilers,  that all successful languages have a way to break their own rules;  and S indeed had some hidden workarounds.  Formalizing these non-functional back doors as R has done with environments is a good thing.
>
> However, the back doors should be used only with extreme reluctance.  I cringe at each new "how to be sneaky" discussion on the mailing lists.  The 'solution' is rarely worth the long term price.
>   \end{opinion}

Hmmm, it seems to me that your first paragraph contradicts your opinion. 
  If you set the environment of a formula to .GlobalEnv then suddenly 
the way that formula acts depends on all sorts of things that weren't 
there when it was created.

Attaching the formula at the time of creation of a formula means that 
the names within it refer to data that is currently in scope.  That's 
generally a good thing.  It means that code will act the same when you 
run it at the top level or in a function.

For example, consider this:

f <- function() {
    x <- 1:10
    x2 <- x^2
    y <- rnorm(10, mean=x2)
    formula <- y ~ x + x2
    formula
}

fit <- lm(f())
update(fit, . ~ . - x)

This code works fine, all because the formula keeps the environment 
where it was created.  If I modify it like this:

f <- function() {
    x <- 1:10
    x2 <- x^2
    y <- rnorm(10, mean=x2)
    formula <- y ~ x + x2
    environment(formula) <- .GlobalEnv
    formula
}

fit <- lm(f())
update(fit, . ~ . - x)

then I really have no idea what it will produce, because it depends on 
global variables y, x and x2, not the local ones created in the 
function.  If I'm lucky, I'll get an "object not found" error; if I'm 
not lucky, it'll just go find some other variables and use those.

Duncan Murdoch