[R] Fwd: Re: Poisson GLM using non-integer response/predictors?

Matthias Gondan matthias-gondan at gmx.de
Fri Dec 30 21:01:02 CET 2011

Hi Ben,

Thanks for clarifying this, I used a misleading word, "model" the 
observation time
sounds as if observation time were the dependent variable - which it is 
not, of course,
instead, in the scenario described, the parrot counts are modeled.

Best wishes,


Am 30.12.2011 20:50, schrieb Ben Bolker:
> Matthias Gondan<matthias-gondan<at>  gmx.de>  writes:
>> Hi,
>> Use offset variables if count occurrences of an event and you want to
>> model the
>> observation time.
>> glm(count ~ predictors + offset(log(observation_time)), family=poisson)
>> If you want to compare durations, look at library(survival), ?coxph
>> If tnoise_sqrt is the square root of tourist noise, your example seems
>> incorrect, because it is a predictor, not the dependent variable
>> tnoise_sqrt ~ lengthfeeding_log
>> Best wishes,
>> Matthias
>> Am 30.12.2011 16:29, schrieb Lucy Dablin:
>>> Great lists, I always find them useful, thank you to
>>> everyone who contributes to them.
>>> My question is regarding non-integer values from some data I
>>> collected on parrots when using the poisson GLM. I observed the
>>> parrots on a daily basis to see if they were affected by tourist
>>> presence. My key predictors are tourist noise (averaged over a day
>>> period so decimal value, square root to adjust for skew), tourist
>>> number (the number of tourists at a site, square root), and the
>>> number of boats passing the site in a day (log). These are
>>> compared with predictors: total number of birds (count data,
>>> square root), average time devoted to foraging at site (log),
>>> species richness (sqrt), and the number of flushes per day. Apart
>>> from the last one they are all non-integer values. When I run a
>>> glm for example:
>   Your description sounds like you might already have transformed
> your predictors: generally speaking, you don't want to do that
> before running a GLM (the variance function incorporated in the
> GLM takes care of heteroscedasticity, and the link function takes
> care of nonlinearity in the response).
>    I suspect you want total number of birds, number of flushes per day,
> and species richness to be modeled as Poisson (or negative binomial --
> see ?glm.nb in the MASS package).  Species richness *might* be
> binomial, or more complicated, if you are drawing from a limited
> species pool (e.g. if there are only 5 possible species and you
> sometimes see 4 or 5 of them in a day).  Is the total number
> of birds really non-integer *before* you square-root transform it?
> Time devoted to foraging at the site is most easily
> modeled as log-normal (unless the response includes zeros:
> i.e., log-transform as you have already done and use lm),
> or possibly Gamma-distributed (although you may want to
> use a log link instead of the default inverse link).
>   As Matthias said, offsets are used for the specific case of
> non-uniform sampling effort (e.g. if you sampled different areas,
> or for different lengths of time, every day).
>    You may be interested in r-sig-ecology at r-project.org , which
> is an R mailing list specifically devoted to ecological questions.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list