[R] extract minimal variables from model

Jacob Wegelin jacobwegelin at fastmail.fm
Fri Jan 6 18:03:39 CET 2017


Given any regression model, created for instance by lm, lme, lmer, or rqs, such as

z1<-lm(weight~poly(Time,2), data=ChickWeight)

I would like a general way to obtain only those variables used for the model.  In the current example, this "minimal data frame" would consist of the "weight" and "Time" variables and none of the other columns of ChickWeight.

(Motivation: Sometimes the data frame contains thousands of variables which are not used in the current regression, and I do not want to keep copying and propagating them.)

The "model" component of the regression object doesn't serve this purpose:

> head(z1$model)
   weight poly(Time, 2).1 poly(Time, 2).2
1     42    -0.066020938     0.072002235
2     51    -0.053701293     0.031099018
3     59    -0.041381647    -0.001334588
4     64    -0.029062001    -0.025298582
5     76    -0.016742356    -0.040792965
6     93    -0.004422710    -0.047817737

The following awkward workaround seems to do it when variable names contain only "word characters" as defined by regex:

minimalvariablesfrommodel20161120 <-function(object, originaldata){
# 
stopifnot(!missing(originaldata))
stopifnot(!missing(object))
intersect(
 	unique(unlist(strsplit(format(object$call$formula), split="\\W", perl=TRUE)))
 	, names(originaldata)
 	)
}

> minimalvariablesfrommodel20161120(z1, ChickWeight)
[1] "weight" "Time" 
>

But if a variable has a space in its name, my workaround fails:

> ChickWeight$"dog tail"<-ChickWeight$Time
> z1<-lm(weight~poly(`dog tail`,2), data=ChickWeight)
> head(z1$model)
   weight poly(`dog tail`, 2).1 poly(`dog tail`, 2).2
1     42          -0.066020938           0.072002235
2     51          -0.053701293           0.031099018
3     59          -0.041381647          -0.001334588
4     64          -0.029062001          -0.025298582
5     76          -0.016742356          -0.040792965
6     93          -0.004422710          -0.047817737
> minimalvariablesfrommodel20161120(z1, ChickWeight)
[1] "weight"
>

Is there a more elegant, and hence more reliable, approach?

Thanks

Jacob A. Wegelin
Assistant Professor
C. Kenneth and Dianne Wright Center for Clinical and Translational Research
Department of Biostatistics
Virginia Commonwealth University
830 E. Main St., Seventh Floor
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. 
URL: http://www.people.vcu.edu/~jwegelin



More information about the R-help mailing list