[Rd] model frames and update()

Achim Zeileis Achim.Zeileis at uibk.ac.at
Thu Apr 23 22:28:59 CEST 2015


On Thu, 23 Apr 2015, Therneau, Terry M., Ph.D. wrote:

> This issue has arisen within my anova.coxph routine, but is as easily 
> illustrated with glm.
>
> testdata <- data.frame(y= 1:5,
>                       n= c(8,10,6,20,14),
>                       sex = c(0,1,0,1,1),
>                       age = c(30,20,35,25,40))
>
> fit <- glm(cbind(y,n) ~ age + sex, binomial, data=testdata, model=TRUE)
> saveit <- fit$model
>
> update(fit, .~. - age, data=saveit)
> Error in cbind(y, n) : object 'y' not found
>
> One would hope that a saved model frame is precisely the thing that 
> would work best. The issue of course is that "cbind(y, n)" is the name 
> of the first variable in saveit, and it is not being properly quoted 
> somewhere down the line.  The same issue can occur on the right hand 
> side.  "Save the model frame in case you need to refit something next 
> month" is does not appear to be a safe approach to reproducable 
> research.

If you want to re-run the same call (which is what the default update 
method does), then you need to save the original data not the 
pre-processed model frame. However, the model frame still has all the 
information you need - but has already evaluated all transformations 
(cbind, log, etc.).

> fit2 <- glm(y ~ sex + log(age), poisson, testdata)
> save2 <- fit2$model
> update(fit2, . ~ . - sex, data=save2)  # fails
> glm(y ~ log(age), poisson, save2)      # fails
>
> I can work around this in my anova, but I wanted to not rebuild the 
> frame if it is already present.  It looks like model.matrix plus attr(x, 
> 'assign') time -- a bit harder to read, but that looks like what 
> anova.glm is doing. Is there a way to make update work?

If you have the model.frame() plus an update formula you could internally 
extract the response, model matrix and weights, e.g.,

## original fit
fit <- glm(cbind(y,n) ~ log(age) + sex, family = binomial,
   data = testdata)

## original model frame
mf <- fit$model

## update formula
up <- . ~ . - sex

## new terms
mt <- update(terms(mf), up)

## response y, matrix x, weights w (NULL here)
y <- model.response(mf)
x <- model.matrix(mt, mf)
w <- model.weights(mf)
## offset could be added similarly

And then you can call glm.fit() or coxph.fit() if you can get the family 
from the original fit.

One should still check that the update formula does not change the 
response (to something which may or may not exist in the model frame). 
Possibly, one could try to get certain control arguments from the original 
glm fit (but probably not 'start').

Maybe this helps... (But I can understand that the issues of data frame 
vs. model frame can be quite a nuisance when programming model utilities 
:))

> The current code, BTW, starts by building its own frame using results of 
> terms.inner, which solves the above issue nicely and update() works as 
> expected.  But it isn't robust to scoping issues.  (As pointed out 
> yesterday by a user: lapply of a function that contained coxph followed 
> by anova gives a variable not found error.)  It also ignores saved model 
> frames; thus the rewrite.
>
> Terry T
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list