[Rd] robust updating methods

Ben Bolker bbolker at gmail.com
Wed Mar 25 00:55:43 CET 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15-03-23 12:55 PM, Thierry Onkelinx wrote:
> Dear Ben,
> 
> Last week I was struggling with incorporating lme4 into a package.
> I traced the problem and made a reproducible example ( 
> https://github.com/ThierryO/testlme4).  It looks very simular to
> the problem you describe.
> 
> The 'tests' directory contains the reproducible examples. confint()
> of a model as returned by a function fails. It even fails when I
> try to calculate the confint() inside the same function as the
> glmer() call (see the fit_model_ci function).
> 
> Best regards,
> 
> Thierry


Ugh.  I can get this to work if I also try searching up the call
stack, as follows (within update.merMod).  This feels like "code
smell" to me though -- i.e., if I have to hack this hard I must be
doing something wrong/misunderstanding how the problem *should* be done.


    if (evaluate) {
        ff <- environment(formula(object))
        pf <- parent.frame()  ## save parent frame in case we need it
        sf <- sys.frames()[[1]]
        tryCatch(eval(call, env=ff),
                 error=function(e) {
                     tryCatch(eval(call, env=sf),
                              error=function(e) {
                                  eval(call, pf)
                              })
                 })
    } else call

  Here is an adapted even-more-minimal version of your code, which
seems to work with the version of update.merMod I just pushed to
github, but fails for glm():


## https://github.com/ThierryO/testlme4/blob/master/R/fit_model_ci.R
fit_model_ci <- function(formula, dataset, mfun=glmer){
    model <- mfun(
        formula = formula,
        data = dataset,
        family = "poisson"
    )
    ci <- confint(model)
    return(list(model = model, confint = ci))
}

library("lme4")
set.seed(101)
dd <- data.frame(f=factor(rep(1:10,each=100)),
                 y=rpois(2,1000))
fit_model_ci(y~(1|f),dataset=dd)
fit_model_ci(y~(1|f),dataset=dd,mfun=glm)



> 
> 
> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
> Research Institute for Nature and Forest team Biometrie &
> Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat
> 25 1070 Anderlecht Belgium
> 
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may
> be able to say what the experiment died of. ~ Sir Ronald Aylmer
> Fisher The plural of anecdote is not data. ~ Roger Brinner The
> combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given
> body of data. ~ John Tukey
> 
> 2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
> 
> WARNING: this is long.  Sorry I couldn't find a way to compress
> it.
> 
> Is there a reasonable way to design an update method so that it's 
> robust to a variety of reasonable use cases of generating calls or 
> data inside or outside a function?  Is it even possible?  Should I 
> just tell users "don't do that"?
> 
> * `update.default()` uses `eval(call, parent.frame())`; this fails 
> when the call depends on objects that were defined in a different 
> environment (e.g., when the data are generated and the model 
> initially fitted within a function scope)
> 
> * an alternative is to store the original environment in which the 
> fitting is done in the environment of the formula and use 
> `eval(call, env=environment(formula(object)))`; this fails if the 
> user tries to update the model originally fitted outside a
> function with data modified within a function ...
> 
> * I think I've got a hack that works below, which first tries in
> the environment of the formula and falls back to the parent frame
> if that fails, but I wonder if I'm missing something much simpler
> ..
> 
> Thoughts?  My understanding of environments and frames is still, 
> after all these years, not what it should be ...
> 
> I've thought of some other workarounds, none entirely
> satisfactory:
> 
> * force evaluation of all elements in the original call * printing
> components of the call can get ugly (can save the original call
> before evaluating) * large objects in the call get duplicated *
> don't use `eval(call)` for updates; instead try to store
> everything internally * this works OK but has the same drawback of
> potentially storing large extra copies * we could try to use the
> model frame (which is stored already), but there are issues with
> this (the basis of a whole separate rant) because the model frame
> stores something in between predictor variables and input
> variables. For example
> 
> d <- data.frame(y=1:10,x=runif(10)) 
> names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)"
> 
> So if we wanted to do something like update to "y ~ sqrt(x)", it
> wouldn't work ...
> 
> ================== update.envformula <- function(object,...) { 
> extras <- match.call(expand.dots = FALSE)$... call <-
> getCall(object) for (i in names(extras)) { existing <-
> !is.na(match(names(extras), names(call))) for (a in
> names(extras)[existing]) call[[a]] <- extras[[a]] if
> (any(!existing)) { call <- c(as.list(call), extras[!existing]) call
> <- as.call(call) } } eval(call, env=environment(formula(object))) 
> ## enclos=parent.frame() doesn't help }
> 
> update.both <- function(object,...) { extras <-
> match.call(expand.dots = FALSE)$... call <- getCall(object) for (i
> in names(extras)) { existing <- !is.na(match(names(extras),
> names(call))) for (a in names(extras)[existing]) call[[a]] <-
> extras[[a]] if (any(!existing)) { call <- c(as.list(call),
> extras[!existing]) call <- as.call(call) } } pf <- parent.frame()
> ## save parent frame in case we need it tryCatch(eval(call,
> env=environment(formula(object))), error=function(e) { eval(call,
> pf) }) }
> 
> ### TEST CASES
> 
> set.seed(101) d <- data.frame(x=1:10,y=rnorm(10)) m1 <-
> lm(y~x,data=d)
> 
> ##' define data within function, return fitted model f1 <-
> function() { d2 <- d lm(y~x,data=d2) return(lm(y~x,data=d2)) } ##'
> define (and modify) data within function, try to update ##' model
> fitted elsewhere f2 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ##
> force copy update.default(m,data=d2) } ##' define (and modify) data
> within function, try to update ##' model fitted elsewhere (use
> envformula) f3 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## force
> copy update.envformula(m,data=d2) }
> 
> ##' hack: first try the formula, then the parent frame ##' if that
> doesn't work for any reason f4 <- function(m) { d2 <- d; d2[1] <-
> d2[1]+0 ## force copy update.both(m,data=d2) }
> 
> ## Case 1: fit within function m2 <- f1() try(update.default(m2))
> ## default: object 'd2' not found m3A <- update.envformula(m2)  ##
> envformula: works m3B <- update.both(m2)        ## works
> 
> ## Case 2: update within function m4A <- f2(m1)  ## default: works 
> try(f3(m1))    ## envformula: object 'd2' not found m4B <- f4(m1)
> ## works
> 
>> 
>> ______________________________________________ 
>> R-devel at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVEfl/AAoJEOCV5YRblxUHWdgH/AqLAhDqKV8aRg6jnX9rO96D
nwzqv0ClMIxVr2dzD4eSQTL2caWZnXVkws+lg9N7bc4BaWplcYxLNRBw5M8zHOPJ
E7JlhG3EecvmeAEt9OY0/q6I0D6vdoEjcH7wzzuyLLIqllu9OskxURi/azMs0XRo
tiN+oG5aOKsMYsEGjtiWySRDzhJh2TM40A1HHjAViqpxZcqilAZ6RiNEFe1t1JY0
IvDI8yesSuHnKtgAiqk9ivGw4BCCGoBSIHB3GrJIi11j06iYKw0ugVHIlKYO8cqf
AYTvEX2sSxsJgKWYTiG/1dr/kiFTntTDji03zRLVUdPKIZATJMczv+KB+0bpoVY=
=Z34K
-----END PGP SIGNATURE-----



More information about the R-devel mailing list