[R] How to intepret a factor response model?

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Wed May 4 10:21:11 CEST 2005


On 04-May-05 Maciej Bliziñski wrote:
> Hello,
> 
> I'd like to create a model with a factor-type response variable.
> This is an example:
> 
>> mydata <- data.frame(factor_var = as.factor(c(rep('one', 100),
>> rep('two', 100), rep('three', 100))), real_var = c(rnorm(150),
>> rnorm(150) + 5))
>> summary(mydata)
>  factor_var     real_var        
>  one  :100   Min.   :-2.742877  
>  three:100   1st Qu.:-0.009493  
>  two  :100   Median : 2.361669  
>              Mean   : 2.490411  
>              3rd Qu.: 4.822394  
>              Max.   : 6.924588  
>> mymodel = glm(factor_var ~ real_var, family = 'binomial', data =
>> mydata)
>> summary(mymodel)
> 
> Call:
> glm(formula = factor_var ~ real_var, family = "binomial", data =
> mydata)
> 
> Deviance Residuals: 
>     Min       1Q   Median       3Q      Max  
> -1.7442  -0.6774   0.1849   0.3133   2.1187  
> 
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)  -0.6798     0.1882  -3.613 0.000303 ***
> real_var      0.8971     0.1066   8.417  < 2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 381.91  on 299  degrees of freedom
> Residual deviance: 213.31  on 298  degrees of freedom
> AIC: 217.31
> 
> Number of Fisher Scoring iterations: 6

Have you noticed that you get identical results with

set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
   rep('two',100), rep('three', 100))),
   real.var = c(rnorm(150), rnorm(150) + 5))

mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)

and

set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
   rep('two',200))),real.var = c(rnorm(150),rnorm(150) + 5))

mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)

(I've left out the "summary(mydata)" since these do naturally
differ, and I've replaced "factor_var" with "factor.var" and
"real_var" with "real.var" because of potential complications
with "_"; also "mymodel =" to "mymodel <-").

So I think the interpretation of the results from your first
model is that, because of the "family='binomial'", glm is
treating "factor.var='one'" as binomial response "0", say,
and "factor.var='two'" or "factor.var='three'" as binomial
response "1".

You're trying to fit a multinomial response, but you've
specified a binomial family to 'glm'. 'glm' does not have
a multinomial response family.

You could try 'multinom' from package 'nnet' which fits
loglinear models to factor responses with more than 2 levels.

E.g.

  library(nnet)
  mymodel <- multinom(factor.var ~ real.var,data=mydata)
   ### weights:  9 (4 variable)
   ##  initial  value 329.583687 
   ##  iter  10 value 209.780666
   ##  final  value 209.779951 
   ##  converged
  summary(mymodel)
   ## Re-fitting to get Hessian
   ## Call:
   ## multinom(formula = factor.var ~ real.var, data = mydata)
   ##  Coefficients:
   ##        (Intercept)  real.var
   ##  three  -3.4262565 1.3838231
   ##  two    -0.6754253 0.7116955
   ##
   ## Std. Errors:
   ##   (Intercept)  real.var
   ## three   0.5028541 0.1480138
   ## two     0.1846827 0.1068821
   ##
   ## Residual Deviance: 419.5599 
   ## AIC: 427.5599 
   ##
   ## Correlation of Coefficients:
   ##             three:(Intercept) three:real.var two:(Intercept)
   ## three:real.var  -0.7286258                                      
   ## two:(Intercept)  0.1986995        -0.1261034                    
   ## two:real.var    -0.1411377         0.7012481     -0.3285741

This output does suggest a fairly clear interpretation!

Hoping this helps,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-May-05                                       Time: 09:18:03
------------------------------ XFMail ------------------------------




More information about the R-help mailing list