[R] Goodness of fit of binary logistic model

Paul Smith phhs80 at gmail.com
Sat Aug 6 16:53:49 CEST 2011


Thanks, Frank. As a rule, I provide an example in my first post.
However, in this case, the data are confidential, and I was not
allowed to provide you those data. Moreover, I thought that I was not
able to generate data exhibiting the reported problem.

Paul


On Sat, Aug 6, 2011 at 3:27 PM, Frank Harrell <f.harrell at vanderbilt.edu> wrote:
> Exactly right Peter.  Thanks.
>
> There should be some way for me to detect such situations so as to not
> result in an impressive P-value.  Ideas welcomed!
>
> This is a great example why users should post a toy example on the first
> posting, as we can immediately see that this model MUST fit the data, so
> that any evidence for lack of fit has to be misleading.
>
> Frank
>
> Peter Dalgaard-2 wrote:
>>
>> On Aug 5, 2011, at 23:16 , Paul Smith wrote:
>>
>>> Thanks, Frank. The following piece of code generate data, which
>>> exhibit the problem I reported:
>>>
>>> -----------------------------------------
>>> set.seed(123)
>>> intercept = -1.32
>>> beta = 1.36
>>> xtest = rbinom(1000,1,0.5)
>>> linpred = intercept + xtest*beta
>>> prob = exp(linpred)/(1 + exp(linpred))
>>> runis = runif(1000,0,1)
>>> ytest = ifelse(runis < prob,1,0)
>>> xtest <- as.factor(xtest)
>>> ytest <- as.factor(ytest)
>>> require(rms)
>>> model <- lrm(ytest ~ xtest,x=T,y=T)
>>> model
>>> residuals.lrm(model,'gof')
>>> -----------------------------------------
>>
>> Basically, what you have is zero divided by zero, except that floating
>> point inaccuracy turns it into the ratio of two small numbers. So the Z
>> statistic is effectively rubbish.
>> This comes about because the SSE minus its expectation has effectively
>> zero variance, which makes it rather useless for testing whether the model
>> fits.
>>
>> Since the model is basically a full model for a 2x2 table, it is not
>> surprising to me that "goodness of fit" tests behave poorly. In fact, I
>> would conjecture that no sensible g.o.f. test exists for that case.
>>
>>>
>>> Paul
>>>
>>>
>>> On Fri, Aug 5, 2011 at 7:58 PM, Frank Harrell
>>> <f.harrell at vanderbilt.edu> wrote:
>>>> Please provide the data or better the R code for simulating the data
>>>> that
>>>> shows the problem.  Then we can look further into this.
>>>> Frank
>>>>
>>>> -----
>>>> Frank Harrell
>>>> Department of Biostatistics, Vanderbilt University
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/Goodness-of-fit-of-binary-logistic-model-tp3721242p3721997.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>> "Døden skal tape!" --- Nordahl Grieg
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/Goodness-of-fit-of-binary-logistic-model-tp3721242p3723388.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list