[Rd] update.default: fall back on model.frame in case that the data frame is not in the parent environment

Duncan Murdoch murdoch.duncan at gmail.com
Tue Aug 2 15:41:17 CEST 2011


It looks to me as though your proposal would allow update to remove 
variables, but would give erroneous results when adding them.  For example:

mm <- function(datf) {
   lm(y ~ x, data = datf)
}
mydatf <- data.frame(x = rep(1:2, 10), y = rnorm(20, rep(1:2, 10)), z = 
rnorm(20))

l <- mm(mydatf)
update(l, . ~ . + z)   # This fails, z is not found

z <- rnorm(20)
update(l, . ~ . + z)   # This finds the wrong z, without a warning

I'd rather get the "datf not found" error than wrong results.

Duncan Murdoch

On 02/08/2011 7:48 AM, Thaler, Thorn, LAUSANNE, Applied Mathematics wrote:
> Dear all,
>
> Suppose the following code:
>
> --------------8<--------------
> mm<- function(datf) {
>    lm(y ~ x, data = datf)
> }
> mydatf<- data.frame(x = rep(1:2, 10), y = rnorm(20, rep(1:2, 10)))
>
> l<- mm(mydatf)
> -------------->8--------------
>
> If I want to update l now without providing the data argument an error
> occurs:
>
> --------------8<--------------
> >  update(l, . ~ .)
> Error in inherits(x, "data.frame") : object 'datf' not found
> -------------->8--------------
>
> and I've to provide the data argument explicitly:
> --------------8<--------------
> update(l, . ~ ., data = mydatf)
> update(l, . ~ ., data = model.frame(l))
> -------------->8--------------
>
> While the first work-around is additionally error prone (what if I
> change the name of mydatf earlier in the file? In the best case I just
> get an error if mydatf is not defined), both options are kind of
> semantically questionable (I do not want to _update_ the data argument
> of the lm object it should remain untouched).
>
> So my suggestion would be that update falls back on the data stored in
> model.frame in case that the data argument in the lm call cannot be
> resolved in the parent.frame of update, which can be easily achieved by
> adding just four lines to update.default:
>
> --------------8<--------------
> update.default<- function (object, formula., ..., evaluate = TRUE) {
>      call<- object$call
>      if (is.null(call))
>          stop("need an object with call component")
>      extras<- match.call(expand.dots = FALSE)$...
>      if (!missing(formula.))
>          call$formula<- update.formula(formula(object), formula.)
>      if (length(extras)) {
>          existing<- !is.na(match(names(extras), names(call)))
>          for (a in names(extras)[existing]) call[[a]]<- extras[[a]]
>          if (any(!existing)) {
>              call<- c(as.list(call), extras[!existing])
>              call<- as.call(call)
>          }
>      }
>      if (!is.null(call$data)) {
>          if (!exists(as.character(call$data), envir = parent.frame()))
>              call$data<- model.frame(object)
>      }
>      if (evaluate)
>          eval(call, parent.frame())
>      else call
> }
> -------------->8--------------
>
> This is just a quick dirty hack which works fine here (with an ugly
> drawback that in the standard output of lm I now see the lengthy
> explicit data.frame statement) but I'm sure there are some cracks out
> there who could take it over from here and beautify this idea.
>
> I don't see any problems with this proposition regarding old code, but
> if I'm wrong and there are some reasons not to touch update.default in
> the way I was proposing please let me know. Any other feedback is highly
> appreciated too.
>
> Thanks for sharing your thoughts with me.
>
> KR,
>
> -Thorn
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list