[R] Hosmer-Lemeshow 'goodness of fit'

JimeBoHe jimena.bohorquez at gmail.com
Thu Mar 29 20:34:19 CEST 2012

I am a new user in R, so I am sorry if this is a basic question but I am kind
of lost here and I really would appreciatte your help.

I have behavioral information of sea lions and I've done a binomial
Generalized Linear model using Rcmdr, to understand what variables are
affecting the place where Male's fights are being made (water or land). One
of my possible models includes Agression Type, number of females around the
fight and temperature in the moment.

> GLM.2 <- glm(Place ~ AgType + Females + Temp, family=binomial(cloglog), 
+   data=June)

> summary(GLM.2)

glm(formula = Place ~ AgType + Females + Temp, family = binomial(cloglog), 
    data = June)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4448  -0.7248   0.4220   0.6741   1.8798  

            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.43071    0.82164   4.175 2.97e-05 ***
AgType      -1.23463    0.37166  -3.322 0.000894 ***
Females      0.05812    0.01320   4.404 1.06e-05 ***
Temp        -0.08614    0.02453  -3.512 0.000444 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 175.79  on 141  degrees of freedom
Residual deviance: 125.33  on 138  degrees of freedom
AIC: 133.33

Number of Fisher Scoring iterations: 8

When I ran hosmerlem test it seems that I have a good fit to the data. 

> hosmerlem(Place, fitted(GLM.2))
      X^2        Df                  P(>Chi) 
7.5123428 8.0000000 0.4824925 

However if I run only AgType, or Agtype + Temp, when running Hosmerlem test;
it gives me an error that says: * ERROR:  'breaks' are not unique*.

I've been having the same error when I try to run other models or when I use
different link functions like logit.

So... right now my questions are:

1. What is this error means? as far as I have seen in the Hosmerlem test
code that I am using, it seems to depend on the yhat probs, but to be honest
I don't really understand what is this "yhat". The code I am using is:

hosmerlem <-
function (y, yhat, g = 10) 
    cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 
        1, 1/g)), include.lowest = T)
    obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
    expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
    chisq <- sum((obs - expect)^2/expect)
    P <- 1 - pchisq(chisq, g - 2)
    c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P)

2. I've seen in older posts that Cessie - van Houwelingen - Copas - Hosmer
unweighted sum of squares test for global goodness of fit test; seems to be
a better alternative. However I am probably doing something wrong because I
can't get it work. 

I installed and loaded rms package and I ran the following command:

> resid(GLM.2, 'gof')

As far as I understand 'gof' is another package with this description:
Implementation of model-checking technique based on cumulative residuals.
So, for the command I also installed and loaded 'gof' package

However I get this: *ERROR: 'arg' should be one of "deviance", "pearson",
"working", "response", "partial"*

If I use any of the arguments that it suggests, it gives me a list of
residuals, but I doesn't show the Z or the P.

So, if someone could help me understand what I am doing wrong I really would
appreciate it. 

View this message in context: http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-tp3508127p4516441.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list