[R] OT: (quasi-?) separation in a logistic GLM

lincoln miseno77 at hotmail.com
Wed Sep 28 10:14:14 CEST 2011


I know that this is a quite old post but I am dealing with a similar warning
message and, also after reading all the posts here, I am still in doubt with
what I should do with my analysis.

I have a dataset where the binary response variable is sex, and the
predictors are several variables (they are percentage). I know that one of
this variables, "hcp", predicts quite well the sex of the individuals given
that the ca. 85% of females have hcp>0 while ca. 85% of females have hcp=0.
I don't believe that it does matter, but all the values are very low (the
maximum value of a predictor doesn't exceed 0.5) and most of predictors show
a very left-skewed distribution (log-normal). 
In any case, I believe that it could be said that quasi-separation is
occurring.

When I run the saturated model I get the warning message:

*********************
>glm.sat1<-glm(sex~hwp+hcp+hnp+twp+d1lp*d2dp+hwp:hcp+hwp:hnp+hwp:twp+hcp:hnp+hcp:twp+hnp:twp,data=dati,family="binomial")
Mensajes de aviso perdidos
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> summary(glm.sat1)

Call:
glm(formula = sex ~ hwp + hcp + hnp + twp + d1lp * d2dp + hwp:hcp + 
    hwp:hnp + hwp:twp + hcp:hnp + hcp:twp + hnp:twp, family = "binomial", 
    data = dati)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.6838  -0.1439   0.2699   0.5694   3.8421  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)     8.940      4.046   2.210 0.027125 *  
hwp            -4.897      4.550  -1.076 0.281793    
hcp          -414.796    113.780  -3.646 0.000267 ***
hnp            -3.689      6.692  -0.551 0.581487    
twp             5.450      2.566   2.124 0.033657 *  
d1lp          -22.436     13.837  -1.621 0.104926    
d2dp          -21.076      9.266  -2.274 0.022940 *  
d1lp:d2dp      65.102     35.394   1.839 0.065864 .  
hwp:hcp       313.514    823.782   0.381 0.703516    
hwp:hnp       -46.572     79.467  -0.586 0.557834    
hwp:twp         8.252     21.503   0.384 0.701156    
hcp:hnp     -1721.320   1925.766  -0.894 0.371409    
hcp:twp       -94.153    459.710  -0.205 0.837721    
hnp:twp        16.598     38.548   0.431 0.666769    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 581.51  on 425  degrees of freedom
Residual deviance: 281.98  on 412  degrees of freedom
  (41 observations deleted due to missingness)
AIC: 309.98

Number of Fisher Scoring iterations: 8

**********************  

The estimates of hcp are very high and its Std.Error are also quite high. If
I go further with the analyses and use the "step" procedure I finally get a
candidate minimal adequate model:

> step.1<-step(glm.sat1)
Start:  AIC=309.98
sex ~ hwp + hcp + hnp + twp + d1lp * d2dp + hwp:hcp + hwp:hnp + 
    hwp:twp + hcp:hnp + hcp:twp + hnp:twp

            Df Deviance    AIC
- hcp:twp    1   282.02 308.02
- hwp:hcp    1   282.11 308.11
- hwp:twp    1   282.13 308.13
- hnp:twp    1   282.17 308.17
- hwp:hnp    1   282.32 308.32
- hcp:hnp    1   282.90 308.90
<none>           281.98 309.98
- d1lp:d2dp  1   285.43 311.43

Step:  AIC=308.02
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hcp + 
    hwp:hnp + hwp:twp + hcp:hnp + hnp:twp

            Df Deviance    AIC
- hwp:hcp    1   282.13 306.13
- hwp:twp    1   282.14 306.14
- hnp:twp    1   282.18 306.18
- hwp:hnp    1   282.43 306.43
- hcp:hnp    1   283.07 307.07
<none>           282.02 308.02
- d1lp:d2dp  1   285.43 309.43

Step:  AIC=306.13
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + 
    hwp:twp + hcp:hnp + hnp:twp

            Df Deviance    AIC
- hwp:twp    1   282.24 304.24
- hnp:twp    1   282.27 304.27
- hwp:hnp    1   282.53 304.53
- hcp:hnp    1   283.25 305.25
<none>           282.13 306.13
- d1lp:d2dp  1   285.46 307.46

Step:  AIC=304.24
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + 
    hcp:hnp + hnp:twp

            Df Deviance    AIC
- hnp:twp    1   282.35 302.35
- hwp:hnp    1   282.83 302.83
- hcp:hnp    1   283.36 303.36
<none>           282.24 304.24
- d1lp:d2dp  1   285.56 305.56

Step:  AIC=302.35
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + 
    hcp:hnp

            Df Deviance    AIC
- hwp:hnp    1   283.06 301.06
- hcp:hnp    1   283.37 301.37
<none>           282.35 302.35
- d1lp:d2dp  1   285.71 303.71
- twp        1   306.17 324.17

Step:  AIC=301.06
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hcp:hnp

            Df Deviance    AIC
- hcp:hnp    1   284.36 300.36
<none>           283.06 301.06
- d1lp:d2dp  1   286.26 302.26
- hwp        1   286.96 302.96
- twp        1   307.98 323.98

Step:  AIC=300.36
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp

            Df Deviance    AIC
<none>           284.36 300.36
- hnp        1   287.07 301.07
- d1lp:d2dp  1   287.80 301.80
- hwp        1   288.08 302.08
- twp        1   308.39 322.39
- hcp        1   484.12 498.12
Hubo 40 avisos (use warnings() para verlos)

> summary(step.1)

Call:
glm(formula = sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp, 
    family = "binomial", data = dati)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7062  -0.1652   0.2669   0.5909   3.5082  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)    8.442      3.925   2.151   0.0315 *  
hwp           -4.692      2.413  -1.945   0.0518 .  
hcp         -464.909     59.109  -7.865 3.68e-15 ***
hnp           -5.673      3.317  -1.710   0.0872 .  
twp            6.193      1.428   4.335 1.46e-05 ***
d1lp         -21.332     13.487  -1.582   0.1137    
d2dp         -19.983      9.042  -2.210   0.0271 *  
d1lp:d2dp     63.218     34.530   1.831   0.0671 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 581.51  on 425  degrees of freedom
Residual deviance: 284.36  on 418  degrees of freedom
  (41 observations deleted due to missingness)
AIC: 300.36

Number of Fisher Scoring iterations: 8
> library(profileModel)
> pp<-profileModel(step.1,quantile=qchisq(0.95,1),objective="ordinaryDeviance")
Preliminary iteration ........ Done

Profiling for parameter (Intercept) ... Done
Profiling for parameter hwp ... Done
Profiling for parameter hcp ... Done
Profiling for parameter hnp ... Done
Profiling for parameter twp ... Done
Profiling for parameter d1lp ... Done
Profiling for parameter d2dp ... Done
Profiling for parameter d1lp:d2dp ... Done

******************************

Plotting the profileModel I get this:

> dev.new()
> plot(pp)

http://r.789695.n4.nabble.com/file/n3850331/Rplot.jpg 


The hcp plot is far by being quadratic as I understood it should be.
I have tried to use the brglm procedure (on saturated model and glm derived
minimal adequate model) and I got almost the very same results.

Any suggestion/commentary on this?
I would appreciate any help,

Simone


--
View this message in context: http://r.789695.n4.nabble.com/OT-quasi-separation-in-a-logistic-GLM-tp875726p3850331.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list