[R] Names of variables needed in newdata for predict.glm

Marc Schwartz marc_schwartz at me.com
Thu Mar 8 16:04:48 CET 2018


Hi Bendix,

If the 'model' argument to glm() is TRUE (the default), you can get the structure of the model frame that was used to fit the model, by using:

> str(mx$data)
'data.frame':	200 obs. of  4 variables:
 $ D: int  0 1 0 1 1 0 1 1 1 1 ...
 $ x: num  0.705 2.15 0.572 1.249 0.807 ...
 $ f: Factor w/ 4 levels "a","b","c","d": 1 4 1 4 4 1 4 2 4 4 ...
 $ Y: num  0.787 8.267 3.085 5.738 9.593 ...


> str(mi$data)
'data.frame':	200 obs. of  4 variables:
 $ D: int  0 1 0 1 1 0 1 1 1 1 ...
 $ x: num  0.705 2.15 0.572 1.249 0.807 ...
 $ f: Factor w/ 4 levels "a","b","c","d": 1 4 1 4 4 1 4 2 4 4 ...
 $ Y: num  0.787 8.267 3.085 5.738 9.593 ...


The first column in the data frame will be the response variable.

In both cases, the offset variable 'Y' is included, whether the offset was part of the formula or specified as a separate argument.

You can then process the results as you need from there, such as:

> sapply(mx$data, class)
        D         x         f         Y 
"integer" "numeric"  "factor" "numeric" 


Regards,

Marc Schwartz




> On Mar 8, 2018, at 12:26 AM, Marc Girondot via R-help <r-help at r-project.org> wrote:
> 
> Hi,
> 
> Some try:
> > names(mi$xlevels)
> [1] "f"
> > all.vars(mi$formula)
> [1] "D" "x" "f" "Y"
> > names(mx$xlevels)
> [1] "f"
> > all.vars(mx$formula)
> [1] "D" "x" "f"
> 
> When offset is indicated out of the formula, it does not work...
> 
> Marc
> 
> Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :
>> I would like to extract the names, modes [numeric/factor] and levels
>> of variables needed in a data frame supplied as newdata= argument to
>> predict.glm()
>> 
>> Here is a small example illustrating my troubles; what I want from
>> (both of) the glm objects is the vector c("x","f","Y") and an
>> indication that f is a factor:
>> 
>> library( splines )
>> dd <- data.frame( D = sample(0:1,200,rep=T),
>>                   x = abs(rnorm(200)),
>>                   f = factor(sample(letters[1:4],200,rep=T)),
>>                   Y = runif(200,0.5,10) )
>> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , family=poisson, data=dd)
>> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), family=poisson, data=dd)
>> 
>> attr(mx$terms,"dataClasses")
>> attr(mi$terms,"dataClasses")
>> mi$xlevels
>> mx$xlevels
>> 
>> ...so far not quite there.
>> 
>> Regards,
>> 
>> Bendix Carstensen
>> 
>> Senior Statistician
>> Steno Diabetes Center
>> Clinical Epidemiology
>> Niels Steensens Vej 2-4
>> DK-2820 Gentofte, Denmark
>> b at bxc.dk
>> bendix.carstensen at regionh.dk
>> http://BendixCarstensen.com


	[[alternative HTML version deleted]]



More information about the R-help mailing list