[Rd] update.default: fall back on model.frame in case that the data frame is not in the parent environment

Thaler, Thorn, LAUSANNE, Applied Mathematics Thorn.Thaler at rdls.nestle.com
Tue Aug 2 13:48:00 CEST 2011

Dear all,

Suppose the following code:

mm <- function(datf) {
  lm(y ~ x, data = datf)
mydatf <- data.frame(x = rep(1:2, 10), y = rnorm(20, rep(1:2, 10)))

l <- mm(mydatf)

If I want to update l now without providing the data argument an error

> update(l, . ~ .)
Error in inherits(x, "data.frame") : object 'datf' not found

and I've to provide the data argument explicitly:
update(l, . ~ ., data = mydatf)
update(l, . ~ ., data = model.frame(l))

While the first work-around is additionally error prone (what if I
change the name of mydatf earlier in the file? In the best case I just
get an error if mydatf is not defined), both options are kind of
semantically questionable (I do not want to _update_ the data argument
of the lm object it should remain untouched).

So my suggestion would be that update falls back on the data stored in
model.frame in case that the data argument in the lm call cannot be
resolved in the parent.frame of update, which can be easily achieved by
adding just four lines to update.default:

update.default <- function (object, formula., ..., evaluate = TRUE) {
    call <- object$call
    if (is.null(call)) 
        stop("need an object with call component")
    extras <- match.call(expand.dots = FALSE)$...
    if (!missing(formula.)) 
        call$formula <- update.formula(formula(object), formula.)
    if (length(extras)) {
        existing <- !is.na(match(names(extras), names(call)))
        for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
        if (any(!existing)) {
            call <- c(as.list(call), extras[!existing])
            call <- as.call(call)
    if (!is.null(call$data)) {
        if (!exists(as.character(call$data), envir = parent.frame()))
            call$data <- model.frame(object)
    if (evaluate) 
        eval(call, parent.frame())
    else call

This is just a quick dirty hack which works fine here (with an ugly
drawback that in the standard output of lm I now see the lengthy
explicit data.frame statement) but I'm sure there are some cracks out
there who could take it over from here and beautify this idea.

I don't see any problems with this proposition regarding old code, but
if I'm wrong and there are some reasons not to touch update.default in
the way I was proposing please let me know. Any other feedback is highly
appreciated too.

Thanks for sharing your thoughts with me.



More information about the R-devel mailing list