[R] Workings of model.frame.default and [.

Frank E Harrell Jr fharrell at virginia.edu
Thu Jul 25 04:27:12 CEST 2002

I sent this note last Friday just before the weekend and didn't get any replies.  I'm sending it again in the hope that someone will offer some insight.  -Frank

Related to my earlier question to which I received very helpful replies, when I provide a subsetting method that automatically drops unused levels of a factor variable, I am getting into a bit of trouble using model.frame.default.  I know that model.frame.default has its own mechanism for dropping unused levels, but my personal preference is to handle this on a more basic level using [.factor and to not specify drop.unused.levels=TRUE to model.frame.default.  That way subsetting operations that are not carried out by model.frame also work the way I want, especially [.data.frame when I attach or otherwise reference a subset of a data frame.

Inside model.frame.default, a 'variables' list is constructed.  For factor variables this has all the original levels.  Then .Internal(model.frame()) is invoked.  This will invoke my local [.factor which drops unused levels.  However, model.frame is affected by the disparity in levels between what's in 'variables' and what is returned during [.data.frame (which calls [.factor), causing model.frame to return an invalid factor variable in which levels are shifted and some real levels at the end have zero frequencies [I am leaving `drop.unused.levels'=FALSE when running model.frame].

Is model.frame doing this by intentional design?  If not, can it be fixed?  It seems to me that to be general .Internal(model.frame()) should not depend on levels not changing when [.data.frame is executed.  If model.frame really needs to operate this way, does anyone see a workaround?

Thanks again, and I'll put in one more plug for [.factor to be modified so that if a system option 'drop.unused.levels' is TRUE (i.e., NOT by default) drop=TRUE is assumed unless drop=FALSE is explicitly stated by the user.  Then I can dispose of my local [.factor once and for all.

Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list