[R] lmer and a response that is a proportion

Mon Dec 4 07:46:08 CET 2006

Hi Cam,

I like John's suggestion too.  The only thing that I would add to it is
that you might find it worthwhile to use lme() instead of lmer(). The
former permits flexible modeling of the variance, whereas to my knowledge
the latter doesn't, yet.  You might find that with judicious modeling of
the variance, the model assumptions could reasonably be met.

Good luck,

Andrew

On Mon, December 4, 2006 3:38 pm, Cameron Gillies wrote:
> Hello Simon and John,
>
> I'm afraid I need to include random effects, both a random intercept and
> possibly random coefficients and it doesn't look like betareg can do that.
>
> John, the data is spread along the range of 0 to 1 with most values closer
> to 1, so it does transform well using the logit transformation.  I was
> trying to avoid that though because I was not sure what impact the
> transformation would have on the random effects or interpretation of the
> coefficients.
>
> Thanks again!
> Cam
>
> On 12/3/06 7:46 PM, "Simon Blomberg" <blomsp at ozemail.com.au> wrote:
>
>> Would beta regression solve your problem? (package betareg)
>>
>> Simon.
>>
>> John Fox wrote:
>>> Dear Cameron,
>>>
>>> Given your description, I thought that this might be the case.
>>>
>>> I'd first examine the distribution of the response variable to see what
>>> it
>>> looks like. If the values don't push the boundaries of 0 and 1, and
>>> their
>>> distribution is unimodal and reasonably symmetric, I'd consider
>>> analyzing
>>> them directly using normally distributed errors. If the values do stack
>>> up
>>> near 0, 1, or both, I'd consider a transformation, or perhaps a
>>> different
>>> family (depending on the pattern); in particular, if they stack up near
>>> both
>>> 0 and 1, a logit or similar transformation could help. Finally, if you
>>> have
>>> many values of 0, 1, or both, then a transformation isn't promising
>>> (and,
>>> indeed, the logit wouldn't be defined for these values). In any event,
>>> I'd
>>> check diagnostics after a preliminary fit.
>>>
>>> I hope this helps,
>>>  John
>>>
>>> --------------------------------
>>> John Fox
>>> Department of Sociology
>>> McMaster University
>>> Hamilton, Ontario
>>> Canada L8S 4M4
>>> 905-525-9140x23604
>>> http://socserv.mcmaster.ca/jfox
>>> --------------------------------
>>>
>>>
>>>> -----Original Message-----
>>>> From: Cameron Gillies [mailto:cgillies at ualberta.ca]
>>>> Sent: Sunday, December 03, 2006 6:31 PM
>>>> To: Prof Brian Ripley; John Fox
>>>> Cc: r-help at stat.math.ethz.ch
>>>> Subject: Re: [R] lmer and a response that is a proportion
>>>>
>>>> Dear Brian and John,
>>>>
>>>> Thanks for your insight.  I'll clarify a couple of things
>>>> incase it changes your advice.
>>>>
>>>> My response is a ratio of two measures taken during a bird's
>>>> path, which varies from 0  to 1, so I cannot convert it
>>>> columns of the number of successes.  It has to be reported as
>>>> the proportion.  I could logit transform it to make it
>>>> normal, but I am trying to avoid that so I can analyze it directly.
>>>>
>>>> The subjects are individual birds and I have a range of
>>>> sample sizes from each bird (from 8 to >200, average of about
>>>> 75 measurements/bird).
>>>>
>>>> Thanks!
>>>> Cam
>>>>
>>>>
>>>> On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:
>>>>
>>>>
>>>>> On Sun, 3 Dec 2006, John Fox wrote:
>>>>>
>>>>>
>>>>>> Dear Cameron,
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: r-help-bounces at stat.math.ethz.ch
>>>>>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron
>>>>>>> Gillies
>>>>>>> Sent: Sunday, December 03, 2006 1:58 PM
>>>>>>> To: r-help at stat.math.ethz.ch
>>>>>>> Subject: [R] lmer and a response that is a proportion
>>>>>>>
>>>>>>> Greetings all,
>>>>>>>
>>>>>>> I am using lmer (lme4 package) to analyze data where the
>>>>>>>
>>>> response is
>>>>
>>>>>>> a proportion (0 to 1).  It appears to work, but I am wondering if
>>>>>>> the analysis is treating the response appropriately -
>>>>>>>
>>>> i.e. can lmer
>>>>
>>>>>>> do this?
>>>>>>>
>>>>>>>
>>>>>> As far as I know, you can specify the response as a proportion, in
>>>>>> which case the binomial counts would be given via the weights
>>>>>> argument -- at least that's how it's done in glm(). An alternative
>>>>>> that should be equivalent is to specify a two-column matrix with
>>>>>> counts of "successes" and "failures" as the response.
>>>>>>
>>>> Simply giving
>>>>
>>>>>> the proportion of successes without the counts wouldn't be
>>>>>>
>>>> appropriate.
>>>>
>>>>>>> I have used both family=binomial and quasibinomial - is one more
>>>>>>> appropriate when the response is a proportion?  The coefficient
>>>>>>> estimates are identical, but the standard errors are larger with
>>>>>>> family=binomial.
>>>>>>>
>>>>>>>
>>>>>> The difference is that in the binomial family the
>>>>>>
>>>> dispersion is fixed
>>>>
>>>>>> to 1, while in the quasibinomial family it is estimated as a free
>>>>>> parameter. If the standard errors are larger with family=binomial,
>>>>>> then that suggests that the data are underdispersed
>>>>>>
>>>> (relative to the
>>>>
>>>>>> binomial); if the difference is substantial -- the factor
>>>>>>
>>>> is just the
>>>>
>>>>>> square root of the estimated dispersion -- then the
>>>>>>
>>>> binomial model is
>>>>
>>>>>> probably not appropriate for the data.
>>>>>>
>>>>> John's last deduction is appropriate to a GLM, but not
>>>>>
>>>> necessarily to
>>>>
>>>>> a GLMM. I don't have detailed experience with lmer for
>>>>>
>>>> binomial, but I
>>>>
>>>>> do for various other fitting routines for GLMM.  Remember
>>>>>
>>>> there are at
>>>>
>>>>> least two sources of randomness in a GLMM, and let us keep
>>>>>
>>>> it simple
>>>>
>>>>> and have just a subject effect and a measurement error.  Then if
>>>>> over-dispersion is happening within subjects, forcing the binomial
>>>>> dispersion (at the measurement level) to 1 tends to increase the
>>>>> estimate of the subject-level variance component to
>>>>>
>>>> compensate, and in
>>>>
>>>>> turn increase some of the standard errors.
>>>>>
>>>>> (Please note the 'tends' in that para, as the details of
>>>>>
>>>> the design do
>>>>
>>>>> matter.  For cognescenti, think about plot and sub-plot
>>>>>
>>>> treatments in
>>>>
>>>>> a split-plot design.)
>>>>>
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Andrew Robinson
Senior Lecturer in Statistics                       Tel: +61-3-8344-9763
Department of Mathematics and Statistics            Fax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: a.robinson at ms.unimelb.edu.au    Website: http://www.ms.unimelb.edu.au