[R] Hosmer- Lemeshow test

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Sep 16 13:08:02 CEST 2008


saggak wrote:
> Dear R - help,
> 
> I am working on the Credit scorecard model. I am using the Logistic regression to arrive at the regression coefficients model.
> 
> I want to use the Hosmer - Lemeshow test . 
> 
> In order to understand the use of R - language, I had referred the following URL 
> 
> Â Â Â Â Â  http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt
> 
> The related data 'diseaseoutbreak' is available at the following URL
> 
> Â Â Â Â Â Â  http://www.stat.sc.edu/~hitchcock/diseaseoutbreakdata.txt
> 
> The R code as mentioned therein is
> 
> ####
> # A function to do the Hosmer-Lemeshow test in R.
> # R Function is due to Peter D. M. Macdonald, McMaster University.
> # 
> hosmerlem <-
> function (y, yhat, g = 10) 
> {
>     cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 
>         1, 1/g)), include.lowest = T)
>     obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
>     expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
>     chisq <- sum((obs - expect)^2/expect)
>     P <- 1 - pchisq(chisq, g - 2)
>     c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P)
> }
> #
> ######
> 
> # Doing the Hosmer-Lemeshow test
> # (after copying the above function into R):
> 
> hosmerlem(disease, fitted(disease.logit))
> However when I ran these commands / functions in R, I got following errors
> 
> Error in model.frame.default(formula = cbind(1 - y, y) ~ cutyhat) : 
> Â  invalid type (list) for variable 'cbind(1 - y, y)'
> 
> Can anyone please guide me as to how to run Hosmer- Lemeshow test, as also how to find out the other usual logistic regression related "Log - likelihood, AIC, Pseudo R etc"?
> 
> Thanking you all in advance
> 
> Saggak

That test is too dependent on cutpoints and does not have adequate power 
.  I recommend replacing it with

@ARTICLE{hos97com,
   author = {Hosmer, D. W. and Hosmer, T. and {le Cessie}, S. and 
Lemeshow, S.},
   year = 1997,
   title = {A comparison of goodness-of-fit tests for the logistic 
regression
           model},
   journal = Statistics in Medicine,
   volume = 16,
   pages = {965-980},
   annote = {goodness-of-fit for binary logistic model;difficulty with
            Hosmer-Lemeshow statistic being dependent on how groups are
            defined;sum of squares test;cumulative sum test;invalidity 
of naive
            test based on deviance;goodness-of-link function;simulation 
setup}

which is implemented in the residuals.lrm function in the Design package.


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list