[R] normalizing a negative binomial distribution and/or incorporating variance structures in a GAMM

Tue Sep 27 20:41:57 CEST 2011

Meredith Jantzen <mjantzen <at> uwo.ca> writes:

>   Hello everyone, Apologies in advance, as this is partially a stats
> question and partially an R question.  I have been using a GAM to
> model the activity level of bats going into and coming out from a
> forested edge.  I had eight microphones set up in a line transect at
> each of eight sites, and I am hoping to construct a model for each
> of 7 species. 

>  My count data has a reverse J-shaped skew and is overdispersed with
> a fair amount of zeros, and I haven't found any transformations that
> will completely normalize it (I've tried square roots and logs). 
> Meanwhile, the variance in call numbers  varies between sites and
> between microphones.  I wanted to use a GAMM to incorporate varComb
> and varIdent, but these can only be applied on data with a gaussian
> distribution. 

> Are there any packages I should be looking into that I don't know
> about that will apply a variance structure on a negative binomial
> distribution?  Or is there some transformation that I should be
> using that will solve my normality issues?  I've been searching the
> R-help boards, everything in Zuur and Woods, but I haven't found an
> answer yet. 

  I'm not entirely clear about this, but this question and the
previous question that Simon Wood answered (about neg binom and
GAMM) suggest to me that you might be going in slightly the
wrong direction.

  If your data are non-normally distributed, your choices are
typically (1) pick an alternative family of distributions to
characterize the variation (e.g. neg binomial or ZINB), (2) use
some form of robust estimation (e.g. rlm in the MASS package),
or (3) try to find a transformation of the data that makes
the data normal (and/or homoscedastic, and/or linear with respect
to the predictor variables). Among ecologists #3 is the classical
approach and #1 is the most common modern approach. Combining
#1 and #3 doesn't make that much sense to me.

  One doesn't necessarily expect the variance to be constant
in a negative binomial model; are the *standardized* residuals
heteroscedastic?  (i.e. does the boxplot of residuals(m,type="pearson")
vs site, microphone, or site*microphone combination look
funky?)

  It's not absolutely clear whether you need zero-inflation 
explicitly or not. There are tests for zero-inflation and
overdispersion (see ref below), but I don't know of any
that are implemented in R ... your choices seem to be

* negative binomial in mgcv:gam, without zero-inflation;
* ZINB in pscl, without the sophisticated GAM machinery
  of mgcv (but you can use spline terms via splines::ns(v,n)
where v is the predictor variable and n is the number of
knots -- it just won't do all the slick automatic complexity
selection that mgcv does)
* it looks like the COZIGAM package will do zero-inflated
GAMs, but it doesn't do negative binomials ...

@article{deng_score_2005,
	title = {Score tests for zero-inflation and 
   over-dispersion in generalized linear models},
	volume = {15},
url = {http://www3.stat.sinica.edu.tw/statistica/j15n1/j15n115/j15n115.html},
	journal = {Statistica Sinica},
	author = {Deng, D. and Paul, {S.R.}},
	year = {2005},
	pages = {257–276}
}