[R] overdispersion and quasibinomial model

Wed Nov 25 16:54:57 CET 2009

djpren wrote:
> Thanks for the reply. Naturally I already searched the site and help for the
> answers to these questions. I think I've figured out how to run a
> quasi-binomial model, but I cannot figure out how to test for
> over-dispersion or how to apply a shapiro-wilk test.
> 
> This is not homework, neither do I have an instructor who is proficient in
> using R. This program was suggested to me by another researcher after he
> witnessed my frustration with the inflexibility of SPSS and other such
> programs. I am on a very tight schedule and I don't have time to become a
> statistician and computer scientist, which is why I wrote 3 very quick
> questions asking for commands that i had already tried to find myself.
> 
> Testing for over-dispersion is probably something I can eventually get to
> grips with, since I just have get variance for the real and modelled data.
> However, I cannot find a command to do shapiro-wilks on the site or on these
> forums. Also, why do you say that most people here wouldn't recommend this
> procedure?
> 
The customary (well, at least to me) to check for overdispersion
is to look at the ratio of the sum of squared Pearson residuals
over residual degrees of freedom. This is well discussed in
MASS (the book).

Example:

library(MASS)
fm1 <- glm(low ~ age + race, family = binomial, data = birthwt)
phi <- sum(resid(fm, type = "pearson")^2) / df.residual(fm)
phi
#[1] 1.011612

For a binomial glm, this value is expected to be near 1.0
as it is here. So there is no indication of overdispersion
in this example.

I don't know of a specific test for overdispersion. Personally,
I start to worry about the adequacy of the model if the data
set is large and phi is greater than about 1.2. For small data
sets I wouldn't be too concerned if phi is less than 1.5.
But this all depends crucially on what you want to do with
your model results. Adjusting phi to be greater than 1.0 will
provide more conservative estimates of the parameters.
Note that using family="quasibinomial" won't change the
parameter estimates, just their SEs.

fm2 <- glm(low ~ age + race, family = quasibinomial, data = birthwt)

Now you can compare summary(fm1) with summary(fm2).

What Shapiro-Wilk has to do with this is: Nothing!

  -Peter Ehlers

> 
> David Winsemius wrote:
>>
>> On Nov 24, 2009, at 3:41 PM, djpren wrote:
>>
>>> I am looking for the correct commands to do the following things:
>>>
>>> 1. I have a binomial logistic regression model and i want to test for
>>> overdispersion.
>> Under the teach a man to fish precept,   ... try:
>>
>> RSiteSearch("test over dispersion binomial models")
>>
>>> 2. If I do indeed have overdispersion i need to then run a quasi- 
>>> binomial
>>> model, but I'm not sure of the command.
>> ?glm
>> # and follow the appropriate links
>>
>>> 3. I can get the residuals of the model, but i need to then apply a  
>>> shapiro
>>> wilk test to test them. Does anyone know the command for this?
>>
>> RSiteSearch("shapiro-wilks")   # not that people here recommend this  
>> procedure
>>
>> The overall flavor of these questions is "homework", so I'm  
>> speculating that you may want to consult your instructors.
>>
>> -- 
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>