[Rd] Avoding scoping problems with model fit objects

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Nov 20 09:34:42 MET 2003


A week or so ago we had a query as to why an example not unlike

  foo <- c(1,1,0,0,1,1)
  rep <- 1:6
  m <- multinom(foo ~ rep)
  summary(m)

failed.  There was little special about multinom here, as

  m <- lm(foo ~ rep, model=FALSE)
  model.matrix(m)

also failed.  In tracking this down a couple of lessons have emerged.

There is a useful paper on `non-standard evaluation' by Thomas Lumley on
http://developer.r-project.org, but we need to dig a bit deeper.  Since ca
1.2.0 or so the environment of the formula of a model fit has been one of
the places used to look for the data used in that model fit.  
Unfortunately, it seems to have been assumed that object$call$formula
would give the environment: it does give the formula but not the
environment, whereas object$terms does usually give the environment (and
in some cases object$formula does too).  (Note that there is a danger
lurking here:  if there is no environment set, environment(foo) will give
NULL, and that is the base package/namespace.)

The final port of call to recreate the data is the parent env.  In this 
case model.frame() is called from model.matrix.default.  So the search for 
`rep' starts in model.matrix.default, and as that is in the base 
namespace, it looks in the namespace before the user's workspace.

What one really wants to do is to look in the environment of the original 
model fit.  We could keep a reference to that, but

- its contents might have changed and
- it would get saved with the object, probably bloating the saved session.

There is a better way, to save the model frame on the model object, which 
is why the example above has non-default args.  So:

Lesson 1

Supply a model= argument in your model-fititng functions and consider 
having model=TRUE as the default.  (I have added this in a few places in 
R-devel and my own packages, including to multinom.)

Also ensure that all the useful information is in the model frame, not
just variables needed in the formula but e.g. subset and weights.

Lesson 2

If you have a model.frame method in your package(s), please review it in
the light of the version of model.frame.lm in R-devel.  You need to ensure
that 

- a saved model frame is used if appropriate,
- the original environment(formula) is found correctly,
- that arguments such as data and subset are not ignored.

I have added code to model.frame.default which may make most of the
simpler model.frame methods unnecessary.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list