[Rd] model.frame and parent environment

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jun 16 23:30:40 CEST 2014

On 16/06/2014 19:35, Therneau, Terry M., Ph.D. wrote:
> Someone has reported a problem with predict.coxph that I can't seem to
> solve.  The underlying issue is with model.frame.coxph; the same issue
> is also found in lm so I'll use that for the example.
> --------------------------
>  > test <- data.frame(y = 1:10 + runif(10), x=1:10)
>  > myfun <- function(formula, nd) {
>      fit <- lm(formula, data=nd, model=FALSE)
>      model.frame(fit)
>      }
>  > myfun(test)
> Error in is.data.frame(data): object "nd" not found

You have specified formula = test and given no value for nd.  Is that 
really what you intended?  It is undocumented that it works for lm().

> --------------------
> 1. The key line, in both model.frame.coxph and model.frame.lm is
>      eval(fcall, env, parent.frame())
> and it appear (at least to me) that the parent.frame() part of this is
> effectively ignored when fcall is itself a reference to model.frame.
> I'd like to understand this better.

Way back (ca R 1.2.0) an advocate of lexical scoping changed 
model.frame.lm to refer to an environment not a data frame for 'env'. 
That pretty fundamental change means that your sort of example is not a 
recommended way to do this: you are mixing scoping models.

> 2. The modeling functions coxph and survreg in the survival default to
> model=FALSE, originally in mimicry of lm and glm; I don't know when R
 > changed the default to model=TRUE for lm and glm.  One possible response

I am not sure R ever did: model = TRUE was the default 16 years ago at 
the beginning of the CVS/SVN archive.

> to my question would be advice to change my routine's defaults too.  I'm
> somewhat reluctant since I work with a few very large data sets, but
> would entertain that discussion as well.   I'd still like to understand
> how model.frame could be made to work under the current regimen.

For smaller problems using model = TRUE is the most robust solution.  As 
the components of the model frame can be changed after fitting, there is 
no way to guarantee to recreate the model frame, so to be sure you need 
to store it.

If you called myfun(y ~ x, test) it will look for 'nd' in the global 
environment, the environment of the formula.  One way to get that to 
work more often is something like

myfun <- function(formula, nd) {
      qnd <- substitute(nd)
      fit <- lm(formula, data=nd, model=FALSE)
      fit$call$data <- qnd

so it looks for the value of 'nd' instead.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-devel mailing list