[R] R Error/Warning Messages with library(MASS) using glm.

Ben Bolker bbolker at gmail.com
Sat Apr 14 17:34:22 CEST 2012


LNadler <lauren.e.nadler <at> gmail.com> writes:

> 
> Hi there,
> 
> I have been having trouble running negative binomial regression (glm.nb)
> using library MASS in R v2.15.0 on Mac OSX.  
> 
> I am running multiple models on the variables influencing the group size of
> damselfish in coral reefs (count data).  For total group size and two of my
> species, glm.nb is working great to deal with overdispersion in my count
> data.  For two of my species, I am getting a mixture of warning and error
> messages.  These species are different from the others in that they have
> slightly smaller group sample size (so there are more 0s) and the group size
> varies more widely (from 1 to 45 fish versus 1 to 11 fish for the other
> species for which the model is working).  Here is a sample of my output and
> the error messages:

> >
> model1<-glm.nb(X...of.C..viridis~PC1+Average.Branch.Diameter+
>  Average.Branch.Spacing+PC1*Average.Branch.Diameter+
> Average.Branch.Diameter*Average.Branch.Spacing+PC1*Average.Branch.Spacing)

 By the way, this model specification is equivalent to
 
X...of.C..viridis~(PC1+Average.Branch.Diameter+Average.Branch.Spacing)^2

 (i.e. main effects plus all pairwise interactions); also,
A*B stands for interactions plus all main effects (A + B+ A:B)
so your formula is slightly (harmlessly) redundant


> There were 50 or more warnings (use warnings() to see the first 50)
> > warnings()
> Warning messages:
> 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
> control$trace >  ... :
>   iteration limit reached
> 2: In sqrt(1/i) : NaNs produced
> 3: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
> control$trace >  ... :
>   iteration limit reached (ETC...CONTINUES ON THE SAME FOR 50 WARNINGS)

> 
> (Dispersion parameter for Negative Binomial(61302.24) family taken to be 1)

  This estimated dispersion parameter is ridiculously large ... either your
data are effectively Poisson, or the the theta estimate is questionable.

> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L,  : 
>   invalid 'nsmall' argument

  This error actually derives not from the model fitting itself, but
from R's attempt to print the summary -- the print.summary method is
trying to use the standard error of theta to determine the format,
but that's NA (or NaN, I forget) in this case.

Here's a reproducible example I came up with several years ago that shows
a similar pathology; I was able to get around it using
bbmle::mle2 ...

#############################################
library(MASS)

tadT <- c(0,0,0,450,0,0,0,315,233,0,200,300)
Distance <- rep(c(10,100,500),rep(4,3))

d <- data.frame(tadT,Distance)
plot(tadT)
plot(Distance,jitter(tadT))
library(ggplot2)
ggplot(d,aes(Distance,tadT))+stat_sum(aes(size=factor(..n..)))+
    theme_bw()+geom_smooth(method="glm",family="quasipoisson")

## constant model
tad.con <- glm.nb(tadT ~ 1, control=glm.control(trace=10))
log(mean(tadT))  ## OK, it got the mean right at least ...
tad.con$theta

ss <- summary(tad.con)

print(ss)
## error
## Error in prettNum(...) invalid 'nsmall' argument
## debug(MASS:::print.summary.negbin)

## x$SE.theta is NaN so dp is NaN so format(...,nsmall=dp)
##  gives an error

## given more iterations to get where it's going, it goes
## out of control (and crashes)
glm.nb(tadT ~ 1, control=glm.control(trace=10,maxit=100))

## doesn't help to make an identity link
glm.nb(tadT ~ 1,link=identity, control=glm.control(trace=10))
glm.nb(tadT ~ 1,link=identity, control=glm.control(trace=10,maxit=100))


library(bbmle)
## can also use starting values of 0,0, but get warnings
m1 <- mle2(tadT~dnbinom(mu=exp(logmu),size=exp(logk)),
     data=d,
     start=list(logmu=4,logk=-1))
confint(m1)  ## works OK

p1 <- profile(m1)
plot(p1)
## also try as a function of distance
tad.dist <- glm.nb(tadT ~ Distance)
## with trace/maxit set
glm.nb(tadT ~ Distance,control=glm.control(trace=10,maxit=100))
## does setting init.theta <<1 help?
glm.nb(tadT ~ Distance,init.theta=1e-6,control=glm.control(trace=10,maxit=100))
glm.nb(tadT ~ Distance,init.theta=1e-1,control=glm.control(trace=10,maxit=100))

m2 <- mle2(tadT~dnbinom(mu=exp(logmu),size=exp(logk)),
           data=d,
           parameters=list(logmu~Distance),
           start=list(logmu=4,logk=-1))
summary(m2)
p2 <- profile(m2)
plot(p2)
confint(p2)

anova(m1,m2)

## setting all zeros to 1 gives reasonable answers but???
tadT1 <- tadT
tadT1[tadT1==0] <- 1
glm.nb(tadT1 ~ Distance,control=glm.control(trace=10,maxit=100))

## quasipoisson does work ... but no likelihood/AIC available
glm.q1 <- glm(tadT ~ 1,family=quasipoisson())
glm.qd <- glm(tadT ~ Distance,family=quasipoisson())



More information about the R-help mailing list