[R] extract data from lm object and then use again?

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Sep 22 19:03:43 CEST 2006


On Fri, 22 Sep 2006, Thomas Lumley wrote:

> On Fri, 22 Sep 2006, Thilo Kellermann wrote:
>
>> Hi,
>>
>> the data of the model fit is stored in lm$model and should work....
>>
>
> Not reliably. In the first place, you should use the accessor function
> model.frame(model) rather than model$model, which works even if the model
> was fitted with model=FALSE.
>
> But even then,
>   glm(formula(model), data=model.frame(model))
> will not work reliably.
> Consider
>   model <- lm(log(Volume)~log(Height)+log(Girth),data=trees)
>
> The model frame has variables called eg "log(Volume)" rather than
> "Volume".
>
> When you need the source data frame you need to do something like
>    eval(model$call$data, environment(formula(model)))
> and even this might not work, eg if the model had no data
> argument.
>
> However, if the model had no data argument then the variables
> must be available in environment(formula(model)), in which case any data
> frame of the right size will do.

to be picky ... must have been available.  You or some other command could 
very easily have changed them.  That's actually why we store the model 
frame by default: there is no other 100% reliable way to get at the data 
used in the fitting (as distinct from the data originally supplied).

> If there are no missing observations or the model  was fitted with
> na.action="na.exclude" then a fairly reliable approach is to use
>   eval(model$call$data, environment(formula(model)))
> if it is not NULL and to fall back to model.frame(model). This is what
> termplot() does.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list