[Rd] model.matrix and subset

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Tue Mar 22 15:04:54 CET 2022


Hmm...

AFAICT, predict.lm does effectively this:

Terms <- delete.response(terms(fit))
m <- model.frame(Terms, data2)
model.matrix(Terms, m)

except for some embellishments that I can't quite grasp at this point.

I expect that this is to circumvent similar issues.

- pd

> On 21 Mar 2022, at 17:43 , Therneau, Terry M., Ph.D. via R-devel <r-devel using r-project.org> wrote:
> 
> I've found the following unexpected behaviour from the model.matrix function, namely that 
> the "subset" argument carries forward when I would not expect it to.
> Here is an example using lm:
> 
> --------------------
> 
> # Data set modified from the lm help file
> test <- data.frame(weight= c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14,
> 4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69),
>                    group = gl(2, 10, 20, labels = c("Ctl","Trt")),
>                    zed = rep (1:2, 10))
> 
> fit <- lm( weight ~ group, test, subset= (zed==1))
> 
> data2 <- data.frame( weight= 1:6,  group= rep(c("Ctl", "Trt"), 3))
> model.matrix (fit, data=data2)
> 
>  Error in eval(substitute(subset), data, env) : object 'zed' not found
> --------------------
> 
> This arises out a user's bug report for survival::concordance; which has methods for 
> formula, lm, glm, and coxph.  I have been using  model.frame and model.matrix to create 
> the new response and linear predictor when a 'newdata' argument is used.    The above 
> issue makes it fail for all of lm, glm, and coxph when the initial model includes a subset.
> 
> I think that the user is correct:  if someone asks for model.matrix(fit, data=new) they 
> almost certainly want the model matrix for exactly that data.  But it leaves me in a bit 
> of a quandry.   I don't want to write private model.matrix methods for glm and lm, and if 
> I fix the coxph methods then they will disagree with the standard ones.
> 
> Thoughts?
> 
> Terry
> 
> -- 
> Terry M Therneau, PhD
> Department of Quantitative Health Sciences
> Mayo Clinic
> therneau using mayo.edu
> 
> "TERR-ree THUR-noh"
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-devel mailing list