[R] lmer and a response that is a proportion

Mon Dec 4 03:24:19 CET 2006

Dear Cameron,

Given your description, I thought that this might be the case. 

I'd first examine the distribution of the response variable to see what it
looks like. If the values don't push the boundaries of 0 and 1, and their
distribution is unimodal and reasonably symmetric, I'd consider analyzing
them directly using normally distributed errors. If the values do stack up
near 0, 1, or both, I'd consider a transformation, or perhaps a different
family (depending on the pattern); in particular, if they stack up near both
0 and 1, a logit or similar transformation could help. Finally, if you have
many values of 0, 1, or both, then a transformation isn't promising (and,
indeed, the logit wouldn't be defined for these values). In any event, I'd
check diagnostics after a preliminary fit.

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: Cameron Gillies [mailto:cgillies at ualberta.ca] 
> Sent: Sunday, December 03, 2006 6:31 PM
> To: Prof Brian Ripley; John Fox
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] lmer and a response that is a proportion
> 
> Dear Brian and John,
> 
> Thanks for your insight.  I'll clarify a couple of things 
> incase it changes your advice.
> 
> My response is a ratio of two measures taken during a bird's 
> path, which varies from 0  to 1, so I cannot convert it 
> columns of the number of successes.  It has to be reported as 
> the proportion.  I could logit transform it to make it 
> normal, but I am trying to avoid that so I can analyze it directly.
> 
> The subjects are individual birds and I have a range of 
> sample sizes from each bird (from 8 to >200, average of about 
> 75 measurements/bird).
> 
> Thanks!
> Cam
> 
> 
> On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:
> 
> > On Sun, 3 Dec 2006, John Fox wrote:
> > 
> >> Dear Cameron,
> >> 
> >>> -----Original Message-----
> >>> From: r-help-bounces at stat.math.ethz.ch 
> >>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron 
> >>> Gillies
> >>> Sent: Sunday, December 03, 2006 1:58 PM
> >>> To: r-help at stat.math.ethz.ch
> >>> Subject: [R] lmer and a response that is a proportion
> >>> 
> >>> Greetings all,
> >>> 
> >>> I am using lmer (lme4 package) to analyze data where the 
> response is 
> >>> a proportion (0 to 1).  It appears to work, but I am wondering if 
> >>> the analysis is treating the response appropriately - 
> i.e. can lmer 
> >>> do this?
> >>> 
> >> 
> >> As far as I know, you can specify the response as a proportion, in 
> >> which case the binomial counts would be given via the weights 
> >> argument -- at least that's how it's done in glm(). An alternative 
> >> that should be equivalent is to specify a two-column matrix with 
> >> counts of "successes" and "failures" as the response. 
> Simply giving 
> >> the proportion of successes without the counts wouldn't be 
> appropriate.
> >> 
> >>> I have used both family=binomial and quasibinomial - is one more 
> >>> appropriate when the response is a proportion?  The coefficient 
> >>> estimates are identical, but the standard errors are larger with 
> >>> family=binomial.
> >>> 
> >> 
> >> The difference is that in the binomial family the 
> dispersion is fixed 
> >> to 1, while in the quasibinomial family it is estimated as a free 
> >> parameter. If the standard errors are larger with family=binomial, 
> >> then that suggests that the data are underdispersed 
> (relative to the 
> >> binomial); if the difference is substantial -- the factor 
> is just the 
> >> square root of the estimated dispersion -- then the 
> binomial model is 
> >> probably not appropriate for the data.
> > 
> > John's last deduction is appropriate to a GLM, but not 
> necessarily to 
> > a GLMM. I don't have detailed experience with lmer for 
> binomial, but I 
> > do for various other fitting routines for GLMM.  Remember 
> there are at 
> > least two sources of randomness in a GLMM, and let us keep 
> it simple 
> > and have just a subject effect and a measurement error.  Then if 
> > over-dispersion is happening within subjects, forcing the binomial 
> > dispersion (at the measurement level) to 1 tends to increase the 
> > estimate of the subject-level variance component to 
> compensate, and in 
> > turn increase some of the standard errors.
> > 
> > (Please note the 'tends' in that para, as the details of 
> the design do 
> > matter.  For cognescenti, think about plot and sub-plot 
> treatments in 
> > a split-plot design.)
>