[Rd] robust updating methods

Ben Bolker bbolker at gmail.com
Sun Mar 22 17:45:53 CET 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

WARNING: this is long.  Sorry I couldn't find a way to compress it.

Is there a reasonable way to design an update method so that it's
robust to a variety of reasonable use cases of generating calls or
data inside or outside a function?  Is it even possible?  Should I
just tell users "don't do that"?

* `update.default()` uses `eval(call, parent.frame())`; this fails
  when the call depends on objects that were defined in a different
  environment (e.g., when the data are generated and the model
  initially fitted within a function scope)

* an alternative is to store the original environment in which the
  fitting is done in the environment of the formula and use
  `eval(call, env=environment(formula(object)))`; this fails if the
  user tries to update the model originally fitted outside a function
  with data modified within a function ...

* I think I've got a hack that works below, which first tries in the
  environment of the formula and falls back to the parent frame if
  that fails, but I wonder if I'm missing something much simpler ..

  Thoughts?  My understanding of environments and frames is still,
after all these years, not what it should be ...

I've thought of some other workarounds, none entirely satisfactory:

* force evaluation of all elements in the original call
    * printing components of the call can get ugly (can save the
original call before evaluating)
    * large objects in the call get duplicated
* don't use `eval(call)` for updates; instead try to store everything
internally
    * this works OK but has the same drawback of potentially storing
large extra copies
    * we could try to use the model frame (which is stored already),
but there are issues with this (the basis of a whole separate rant)
because the model frame stores something in between predictor
variables and input variables. For example

   d <- data.frame(y=1:10,x=runif(10))
   names(model.frame(lm(y~log(x),data=d)))
   ## "y" "log(x)"

So if we wanted to do something like update to "y ~ sqrt(x)",
it wouldn't work ...

==================
update.envformula <- function(object,...) {
    extras <- match.call(expand.dots = FALSE)$...
    call <- getCall(object)
    for (i in names(extras)) {
        existing <- !is.na(match(names(extras), names(call)))
        for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
        if (any(!existing)) {
            call <- c(as.list(call), extras[!existing])
            call <- as.call(call)
        }
    }
    eval(call, env=environment(formula(object)))
    ## enclos=parent.frame() doesn't help
}

update.both <- function(object,...) {
    extras <- match.call(expand.dots = FALSE)$...
    call <- getCall(object)
    for (i in names(extras)) {
        existing <- !is.na(match(names(extras), names(call)))
        for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
        if (any(!existing)) {
            call <- c(as.list(call), extras[!existing])
            call <- as.call(call)
        }
    }
    pf <- parent.frame()  ## save parent frame in case we need it
    tryCatch(eval(call, env=environment(formula(object))),
             error=function(e) {
                 eval(call, pf)
             })
}

### TEST CASES

set.seed(101)
d <- data.frame(x=1:10,y=rnorm(10))
m1 <- lm(y~x,data=d)

##' define data within function, return fitted model
f1 <- function() {
    d2 <- d
    lm(y~x,data=d2)
    return(lm(y~x,data=d2))
}
##' define (and modify) data within function, try to update
##' model fitted elsewhere
f2 <- function(m) {
    d2 <- d; d2[1] <- d2[1]+0 ## force copy
    update.default(m,data=d2)
}
##' define (and modify) data within function, try to update
##' model fitted elsewhere (use envformula)
f3 <- function(m) {
    d2 <- d; d2[1] <- d2[1]+0 ## force copy
    update.envformula(m,data=d2)
}

##' hack: first try the formula, then the parent frame
##' if that doesn't work for any reason
f4 <- function(m) {
    d2 <- d; d2[1] <- d2[1]+0 ## force copy
    update.both(m,data=d2)
}

## Case 1: fit within function
m2 <- f1()
try(update.default(m2))       ## default: object 'd2' not found
m3A <- update.envformula(m2)  ## envformula: works
m3B <- update.both(m2)        ## works

## Case 2: update within function
m4A <- f2(m1)  ## default: works
try(f3(m1))    ## envformula: object 'd2' not found
m4B <- f4(m1)  ## works

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVDvHBAAoJEOCV5YRblxUHtnwH/2452Fqmsm//LxNeXxhNrs/G
3ss8rPV2EVJQFjAyO34zzuYZVp1mWlgt7C1L7nXcH4lcf66gYfj75X/yVvMxxwmS
uXEP9HKr4TR5D6Ukf16qqtK41PhTkjS+RDaEpj6BVq9YxlYb53kSjKF+anrf08SL
K6i2L+c4nJIwk56CCp0Re/EIDCNeW5lXYX9jdPAalMoxSHTG8wmytvq0x89+KsRC
aUTpdylPka70vwBQle9W4OEyhBpADIHfEg8csYXZ/MeOKM0bqCRu2IU3OyuCsxyX
Pbo3w1Z7x0e+2WoX4PcltweYK4PJC9O10v+RKYaI5YRy1dFWMQWcPoAKRKf7jwY=
=S7zP
-----END PGP SIGNATURE-----



More information about the R-devel mailing list