Possible bug with glm.nb and starting values (PR#1695)

ripley@stats.ox.ac.uk ripley@stats.ox.ac.uk
Thu, 20 Jun 2002 18:33:42 +0100 (BST)


It's an R/S incompatibility issue. `start' should be interpreted in S's
sense, which is R's etastart.

Why R chose to be gratuitously incompatible has always puzzled me.

Remember MASS was written for S (actually before R), and not every
statement has been checked for R/S discrepancies.

On Thu, 20 Jun 2002 bcooper@hsph.harvard.edu wrote:

> Full_Name: Ben Cooper
> Version: 1.5.0
> OS: linux
> Submission from: (NULL) (134.174.187.90)
>
>
> The help page for glm.nb (in MASS package) says that it takes "Any other
> arguments for the glm() function except family"
>
> One such argument is   start   "starting values for the parameters in the linear
> predictor."
>
> However, when called with starting values glm.nb returns:
>
>    Error in model.frame(formula, rownames, variables, varnames, extras,
> extranames,  :
> 	  variable lengths differ
>
> So it looks like this is either a bug, the documentation is inaccurate, or I'm
> missing something (or some combination of the above).
>
>
> An example:
> ________________________________________________________________
>
>
> library(MASS)
>
> y<-c(7,5,4,7,5,2,11,5,5,4,2,3,4,3,5,9,6,7,10,6,
> 12,6,3,5,3,9,13,0,6,1,2,0,1,0,0,4,5,1,5,3,3,4)
>
> lag1<-c(0,7,5,4,7,5,2,11,5,5,4,2,3,4,3,5,9,6,7,10,6,
> 12,6,3,5,3,9,13,0,6,1,2,0,1,0,0,4,5,1,5,3,3)
>
> lag2<-c(0,0,7,5,4,7,5,2,11,5,5,4,2,3,4,3,5,9,6,7,10,6,
> 12,6,3,5,3,9,13,0,6,1,2,0,1,0,0,4,5,1,5,3)
>
> lag3<-c(0,0,0,7,5,4,7,5,2,11,5,5,4,2,3,4,3,5,9,6,7,10,6,
> 12,6,3,5,3,9,13,0,6,1,2,0,1,0,0,4,5,1,5)
>
>
> ># first a poisson model which is OK
> >glm(y~lag1+lag2+lag3,family=poisson(link=identity))
> Error: no valid set of coefficients has been found:please supply starting
> values
> In addition: Warning message:
> NaNs produced in: log(x)
>
> > #therefore try:
> > glm(y~lag1+lag2+lag3,family=poisson(link=identity),start=c(2,0.1,0.1,0.1))
>
> # and this works. However, negative binomial model is not OK:
>
> > glm.nb(y~lag1+lag2+lag3,link=identity)
> Error: no valid set of coefficients has been found:please supply starting
> values
> In addition: Warning message:
> NaNs produced in: log(x)
>
> > #so try
> > glm.nb(y~lag1+lag2+lag3,link=identity,start=c(2,0.1,0.1,0.1))
> Error in model.frame(formula, rownames, variables, varnames, extras, extranames,
>  :
> 	variable lengths differ
>
> ________________________________________________________________
>
>
>
> I can get glm.nb to work with starting values with the following hack to the
> glm.nb code:
>
> i)
>
> change the line
>  m$method <- m$model <- m$x <- m$y <- m$control <- m$contrasts <- m$init.theta
> <- m$link <- m$... <- NULL
>
>  to
>
>  m$method <- m$model <- m$x <- m$y <- m$control <- m$contrasts <- m$init.theta
> <- m$link <-m$start <- m$... <- NULL
>
>
> [i.e. insert m$start]
>
> ii)
>
> remove the line:
>
> start <- model.extract(m, start)
>
>
> iii)
>
> change the line
>
> fit <- glm.fitter(x = X, y = Y, w = w, etastart = start,
>         offset = offset, family = fam0, control = list(maxit = control$maxit,
>             epsilon = control$epsilon, trace = control$trace >
>                 1))
>
> to
>    fit <- glm.fitter(x = X, y = Y, w = w, start=start,
>         offset = offset, family = fam0, control = list(maxit = control$maxit,
>             epsilon = control$epsilon, trace = control$trace >
>                 1))
>
> iv) change default value of start to NULL (as in glm) rather than eta when
> glm.nb is called.
>
> newglm.nb with these changes then works.
>
> But no doubt there  are reasons for things being the way they are and these
> changes screw other things up.
>
> $platform
> [1] "i686-pc-linux-gnu"
> $arch
> [1] "i686"
> $os
> [1] "linux-gnu"
> $system
> [1] "i686, linux-gnu"
> $status
> [1] ""
> $major
> [1] "1"
> $minor
> [1] "5.0"
> $year
> [1] "2002"
> $month
> [1] "04"
> $day
> [1] "29"
> $language
> [1] "R"
>
> I also tried this with version 1.4.1 under Windows and had the same probelm.
>
> cheers
>
> Ben
>
>
>
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._