[R] Optimization problem

Ben Bolker bolker at zoo.ufl.edu
Wed Aug 22 03:59:26 CEST 2007


  (Hope this gets threaded properly.  Sorry if it doesn't.)

   Gabor: Lac and Lacfac being the same is irrelevant, wouldn't
produce NAs (but would produce something like a singular Hessian
and maybe other problems) -- but they're not even specified in this
model.

  The bottom line is that you have a location with a single
observation, so the GLM that zicounts runs to get the initial
parameter values has an unestimable location:mass interaction
for one location, so it gives an NA, so optim complains.

  In gruesome detail:

## set up  data
scardat = read.table("scars.dat",header=TRUE)
library(zicounts)
## try to run model
zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scardat)
## tried to debug this by dumping zicounts.R to a file, modifying
## it to put a "trace" argument in that would print out the parameters
## and log-likelihood for every call to the log-likelihood function.
dump("zicounts",file="zicounts.R")
source("zicounts.R")
zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scardat,trace=TRUE)
## this actually didn't do any good because the negative log-likelihood
## function never gets called -- as it turns out optim() barfs when it
## gets its initial values, before it ever gets to evaluating the 
log-likelihood

## check the glm -- this is the equivalent of what zicounts does to
## get the initial values of the x parameters
p1 <- glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass,
          data=scardat,family="poisson")
which(is.na(coef(p1)))

## find out what the deal is
table(scardat$Location)

scar2 = subset(scardat,Location!="Randalstown")
## first step to removing the bad point from the data set -- but ...
table(scar2$Location)
## it leaves the Location factor with the same levels, so
##  now we have ZERO counts for one location:
## redefine the factor to drop unused levels
scar2$Location <- factor(scar2$Location)
## OK, looks fine now
table(scar2$Location)

zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scar2)
## now we get another error ("system is computationally singular" when
## trying to compute Hessian -- overparameterized?)   Not in any
## trivial way that I can see.  It would be nice to get into the guts
## of zicounts and stop it from trying to invert the Hessian, which is
## I think where this happens.

  In the meanwhile, I have some other  ideas about this analysis (sorry,
but you started it ...)

  Looking at the data in a few different ways:

library(lattice)
xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE,
       auto.key=list(columns=3))
xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE)

xyplot(Scars~Lar,groups=Location,data=scar2,
       auto.key=list(columns=3))
xyplot(Scars~Mass|Lar,data=scar2)
xyplot(Scars~Lar|Location,data=scar2)

   Some thoughts: (1) I'm not at all sure that
zero-inflation is necessary (see Warton 2005, Environmentrics).
This is a fairly small, noisy data set without huge numbers
of zeros -- a plain old negative binomial might be fine.
 
   I don't actually see a lot of signal here, period (although there may 
be some) ...
there's not a huge range in Lar (whatever it is -- the rest of the 
covariates I
think I can interpret).  It would be tempting to try to fit location as 
a random
effect, because fitting all those extra degrees of freedom is going to 
kill you.
On the other hand, GLMMs are a bit hairy.

   cheers
       Ben



More information about the R-help mailing list