[Rd] model.frame(), model.matrix(), and derived predictor variables
Ben Bolker
bbolker at gmail.com
Thu Aug 29 15:21:52 CEST 2013
On 13-08-28 05:43 PM, Gabriel Becker wrote:
> Ben,
>
> It works for me ...
>> x = rpois(100, 5) + 1
>> y = rnorm(100, x)
>> d = data.frame(x,y)
>> m <- lm(y~log(x),d)
>> update(m,data=model.frame(m))
>
> Call:
> lm(formula = y ~ log(x), data = model.frame(m))
>
> Coefficients:
> (Intercept) log(x)
> -4.010 5.817
>
>
That's because x and y are still lying around in your global
environment. If you rm(x); rm(y) then it won't work any more. And it
wouldn't have worked if you had constructed your model frame directly as
d = data.frame(x=rpois(100,5)+1)
d = transform(d,y=rnorm(100,x))
>
> You can also re-fit using the model.matrix directly. In your example,
> the model frame can be passed directly to lm.fit /lm.wfit.
Yes, if I want to refit the same model. But if I want to do
something else with the model (e.g. try fitting vs. x instead of log(x),
or some other function of x) then it doesn't work.
cheers
Ben
>
>
> ~G
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.1
>
>
>
>
> On Sat, Aug 24, 2013 at 7:40 PM, Ben Bolker <bbolker at gmail.com
> <mailto:bbolker at gmail.com>> wrote:
>
>
> Bump: just trying one more time to see if anyone had thoughts on this
> (so far it's just <crickets> ...)
>
>
> -------- Original Message --------
> Subject: model.frame(), model.matrix(), and derived predictor variables
> Date: Sat, 17 Aug 2013 12:19:58 -0400
> From: Ben Bolker <bbolker at gmail.com <mailto:bbolker at gmail.com>>
> To: R-devel at stat.math.ethz.ch <mailto:R-devel at stat.math.ethz.ch>
> <R-devel at stat.math.ethz.ch <mailto:R-devel at stat.math.ethz.ch>>
>
>
> Dear r-developers:
>
> I am struggling with some fundamental aspects of model.frame().
>
> Conceptually, I think of a flow from data -> model.frame() ->
> model.matrix; the data contain _input variables_, while model.matrix
> contains _predictor variables_: data have been transformed, splines and
> polynomials have been expanded into their corresponding
> multi-dimensional bases, and factors have been expanded into appropriate
> sets of dummy variables depending on their contrasts.
> I originally thought of model.frame() as containing input variables as
> well (but with only the variables needed by the model, and with cases
> containing NAs handled according to the relevant na.action setting), but
> that's not quite true. While factors are retained as-is, splines and
> polynomials and parameter transformations are evaluated. For example
>
> d <- data.frame(x=1:10,y=1:10)
> model.frame(y~log(x),d)
>
> produces a model frame with columns 'y', 'log(x)' (not 'y', 'x').
>
> This makes it hard (impossible?) to use the model frame to re-evaluate
> the existing formula in a model, e.g.
>
> m <- lm(y~log(x),d)
> update(m,data=model.frame(m))
> ## Error in eval(expr, envir, enclos) : object 'x' not found
>
> It seems to me that this is a reasonable thing to want to do
> (i.e. use the model frame as a stored copy of the data that
> can be used for additional model operations); otherwise, I
> either need to carry along an additional copy of the data in
> a slot, or hope that the model is still living in an environment
> where it can find a copy of the original data.
>
> Does anyone have any insights into the original design choices,
> or suggestions about how they have handled this within their own
> code? Do you just add an additional data slot to the model? I've
> considered trying to write some kind of 'augmented' model frame, that
> would contain the equivalent of
> that appeared in the formula but not in the model frame ...].
> setdiff(all.vars(formula),model.frame(m)) [i.e. all input variables
> that appeared in the formula but not in the model frame ...].
>
> thanks
> Ben Bolker
>
> ______________________________________________
> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
More information about the R-devel
mailing list