[R] Hosmer-Lemeshow 'goodness of fit'

Fri Mar 30 14:23:24 CEST 2012

At 19:34 29/03/2012, JimeBoHe wrote:
>I am a new user in R, so I am sorry if this is a basic question but I am kind
>of lost here and I really would appreciatte your help.

Dear Jimena

Comments in-line

>I have behavioral information of sea lions and I've done a binomial
>Generalized Linear model using Rcmdr, to understand what variables are
>affecting the place where Male's fights are being made (water or land). One
>of my possible models includes Agression Type, number of females around the
>fight and temperature in the moment.
>
>*******************
> > GLM.2 <- glm(Place ~ AgType + Females + Temp, family=binomial(cloglog),
>+   data=June)
>
> > summary(GLM.2)
>
>Call:
>glm(formula = Place ~ AgType + Females + Temp, family = binomial(cloglog),
>     data = June)
>
>Deviance Residuals:
>     Min       1Q   Median       3Q      Max
>-2.4448  -0.7248   0.4220   0.6741   1.8798
>
>Coefficients:
>             Estimate Std. Error z value Pr(>|z|)
>(Intercept)  3.43071    0.82164   4.175 2.97e-05 ***
>AgType      -1.23463    0.37166  -3.322 0.000894 ***
>Females      0.05812    0.01320   4.404 1.06e-05 ***
>Temp        -0.08614    0.02453  -3.512 0.000444 ***
>---
>Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>(Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 175.79  on 141  degrees of freedom
>Residual deviance: 125.33  on 138  degrees of freedom
>AIC: 133.33
>
>Number of Fisher Scoring iterations: 8
>****************************************
>
>
>When I ran hosmerlem test it seems that I have a good fit to the data.
>
>************************
> > hosmerlem(Place, fitted(GLM.2))
>       X^2        Df                  P(>Chi)
>7.5123428 8.0000000 0.4824925
>*****************************
>
>However if I run only AgType, or Agtype + Temp, when running Hosmerlem test;
>it gives me an error that says: * ERROR:  'breaks' are not unique*.
>
>I've been having the same error when I try to run other models or when I use
>different link functions like logit.
>
>
>So... right now my questions are:
>
>1. What is this error means? as far as I have seen in the Hosmerlem test
>code that I am using, it seems to depend on the yhat probs, but to be honest
>I don't really understand what is this "yhat". The code I am using is:

yhat is the predicted value of y (the outcome variable). You do not 
provide us with a reproducible example so this is only a guess but I 
suspect if you look at the values of yhat for the failing models you 
will find you have many replicated values and so at least two values 
of the breakpoints are identical. But, as I say, this is at best a guess.

>hosmerlem <-
>function (y, yhat, g = 10)
>{
>     cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0,
>         1, 1/g)), include.lowest = T)
>     obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
>     expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
>     chisq <- sum((obs - expect)^2/expect)
>     P <- 1 - pchisq(chisq, g - 2)
>     c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P)
>}
>
>
>2. I've seen in older posts that Cessie - van Houwelingen - Copas - Hosmer
>unweighted sum of squares test for global goodness of fit test; seems to be
>a better alternative. However I am probably doing something wrong because I
>can't get it work.
>
>I installed and loaded rms package and I ran the following command:
>
> > resid(GLM.2, 'gof')
>
>As far as I understand 'gof' is another package with this description:
>Implementation of model-checking technique based on cumulative residuals.
>So, for the command I also installed and loaded 'gof' package

You seem to be so far away from success here that I think remote 
advice is not going to help.

>However I get this: *ERROR: 'arg' should be one of "deviance", "pearson",
>"working", "response", "partial"*
>
>If I use any of the arguments that it suggests, it gives me a list of
>residuals, but I doesn't show the Z or the P.

That is because it is doing what it is documented to. I am not 
familiar with the gof function nor do I see it in the index for rms 
(as you implied) so it is a bit hard to know what you are trying to do here.

>So, if someone could help me understand what I am doing wrong I really would
>appreciate it.
>
>
>
>
>--
>View this message in context: 
>http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-tp3508127p4516441.html
>Sent from the R help mailing list archive at Nabble.com.

Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html