[R] lmer and a response that is a proportion

Sun Dec 3 23:47:43 CET 2006

On Sun, 3 Dec 2006, John Fox wrote:

> Dear Cameron,
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron Gillies
>> Sent: Sunday, December 03, 2006 1:58 PM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] lmer and a response that is a proportion
>>
>> Greetings all,
>>
>> I am using lmer (lme4 package) to analyze data where the
>> response is a proportion (0 to 1).  It appears to work, but I
>> am wondering if the analysis is treating the response
>> appropriately - i.e. can lmer do this?
>>
>
> As far as I know, you can specify the response as a proportion, in which
> case the binomial counts would be given via the weights argument -- at least
> that's how it's done in glm(). An alternative that should be equivalent is
> to specify a two-column matrix with counts of "successes" and "failures" as
> the response. Simply giving the proportion of successes without the counts
> wouldn't be appropriate.
>
>> I have used both family=binomial and quasibinomial - is one
>> more appropriate when the response is a proportion?  The
>> coefficient estimates are identical, but the standard errors
>> are larger with family=binomial.
>>
>
> The difference is that in the binomial family the dispersion is fixed to 1,
> while in the quasibinomial family it is estimated as a free parameter. If
> the standard errors are larger with family=binomial, then that suggests that
> the data are underdispersed (relative to the binomial); if the difference is
> substantial -- the factor is just the square root of the estimated
> dispersion -- then the binomial model is probably not appropriate for the
> data.

John's last deduction is appropriate to a GLM, but not necessarily to a 
GLMM. I don't have detailed experience with lmer for binomial, but I do 
for various other fitting routines for GLMM.  Remember there are at least 
two sources of randomness in a GLMM, and let us keep it simple and have 
just a subject effect and a measurement error.  Then if over-dispersion is 
happening within subjects, forcing the binomial dispersion (at the 
measurement level) to 1 tends to increase the estimate of the 
subject-level variance component to compensate, and in turn increase some
of the standard errors.

(Please note the 'tends' in that para, as the details of the design do 
matter.  For cognescenti, think about plot and sub-plot treatments in a 
split-plot design.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595