[R] Question on zero-inflated Poisson count data with repeated measures design - glmm.ADMB

Tue Apr 14 14:12:31 CEST 2009

Strubbe Diederik <diederik.strubbe <at> ua.ac.be> writes:

> 
> Dear R community,
> 
> I have some questions regarding the analysis of a zero-inflated count 
dataset and repeated measures design.
> 
> The dataset is arranged as follows :
> Unit of analysis: point - these are points were bird were counted during a
certain amount of time. In total we have about 175 points. Each point is 
located within a certain habitat fragment (here: "site" =
> A-B-C-D-..., in reality we have 25 sites,i.e. forest fragments). All points
were counted five times
> during three years ( thus in total, each point was counted 15 times). We want
to relate the bird abundance to 
> a number of habitat variables (here: X1-X2-X3) collected at the site level.
Abundance: this is the number
> of birds counted at a point. In most cases ( > 90%), no birds were detected
and the abundance dataset is thus zero-inflated.
> 
> I have been looking for a code to analyze this zero-inflated poisson
distributed dataset with a repeated
> measures design, and I have arrived at the glmmADMB package.
> 
> library(glmmADMB)
> data <- read.table("D:/Boris/Borisdataset.csv",sep=",",header=TRUE)
> count <- data$count
> site <- data$site
> abundance <- data$abundance
>
test<-glmm.admb(abundance~data$X1+data$X2+data$year,random=~count,
group="site",data=data,family="poisson",zeroInflation=TRUE)
> 
> [ for clarity: in the above syntax: count ranges from 1-5 as each site has
been counted 5 times in a year, site
> refers to one of the 25 forest fragments in which the point counts were
conducted, Xi are the habitat variables].
> 
> My questions are:
> - does it make sense to analyze these data at the point level, as all habitat
variables are collected at the
> site level, meaning that for all points belonging to a certain forest 
fragment, the habitat variables
> have the same value. If it does make sense, is the proposed syntax ok? Is 
there any option to include year as a
> random effect, as I am not especially interested in differences between years.

> -it looks appealing to average the point count values for each forest 
fragment, and to analyze the data with
> "forest fragment" as unit of analysis. However, also when averaging across 
fragments, the dataset is
> still zero-inflated. It is however impossible to a zero-inflated Poisson
distribution for this
> analysis, as the averaged forest fragment values are not always discrete 
values.

Rather than averaging, one can side-step that problem by instead summing over
points within sites and using a log(time) offset for any fixed differences in
time of observation across sites.

It sounds sensible to me to take the approach of a site-level analysis, but my
credentials are not in statistics so it's possible that a more authoritative
answer would be offered.

-- 
David Winsemius