[Rd] model.matrix and subset

Therneau, Terry M., Ph.D. therne@u @end|ng |rom m@yo@edu
Mon Mar 21 17:43:07 CET 2022

I've found the following unexpected behaviour from the model.matrix function, namely that 
the "subset" argument carries forward when I would not expect it to.
Here is an example using lm:


# Data set modified from the lm help file
test <- data.frame(weight= c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14,
                    group = gl(2, 10, 20, labels = c("Ctl","Trt")),
                    zed = rep (1:2, 10))

fit <- lm( weight ~ group, test, subset= (zed==1))

data2 <- data.frame( weight= 1:6,  group= rep(c("Ctl", "Trt"), 3))
model.matrix (fit, data=data2)

  Error in eval(substitute(subset), data, env) : object 'zed' not found

This arises out a user's bug report for survival::concordance; which has methods for 
formula, lm, glm, and coxph.  I have been using  model.frame and model.matrix to create 
the new response and linear predictor when a 'newdata' argument is used.    The above 
issue makes it fail for all of lm, glm, and coxph when the initial model includes a subset.

I think that the user is correct:  if someone asks for model.matrix(fit, data=new) they 
almost certainly want the model matrix for exactly that data.  But it leaves me in a bit 
of a quandry.   I don't want to write private model.matrix methods for glm and lm, and if 
I fix the coxph methods then they will disagree with the standard ones.



Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
therneau using mayo.edu

"TERR-ree THUR-noh"

