[Rd] glm start/offset bugs (PR#1422)

Prof Brian D Ripley ripley@stats.ox.ac.uk
Fri, 29 Mar 2002 06:25:11 +0000 (GMT)


On Fri, 29 Mar 2002 presnell@stat.ufl.edu wrote:

>
> --fupGvOGOQM
> Content-Type: text/plain; charset=us-ascii
> Content-Description: message body and .signature
> Content-Transfer-Encoding: 7bit
>
>
> There's a simple bug in the handling of the start and offset arguments
> in glm and glm.fit.  The bug exists in the latest development version
> of R (version information below), but it appears that glm.R has not
> been touched much lately, so the bug affects at least the most recent
> stable release of R.
>
> Here is a simple example:
>
> > data(ships, package=MASS)
> > ships.glm <- glm(incidents ~ type + year + period + offset(log(service)),
> + 		  family=poisson, data=ships, subset=service != 0)
> > glm(incidents ~ type + year + period + offset(log(service)),
> +     family=poisson, data=ships, subset=service != 0, start=coef(ships.glm))
>
> or more simply
>
> > update(ships.glm, start=coef(ships.glm))
>
> The problem is caused by a bad initialization of etastart in glm.fit()
> when an offset is present.  Fixing this and retrying the example above
> then reveals another bug, this time in glm(), that is tickled only
> when non-NULL values are given to both the offset and start arguments.
>
> I've attached to the bottom of this message a simple patch to
> src/library/base/R/glm.R that I believe correctly repairs these bugs.
>
> Essentially the same change is needed in Berwin Turlach's Wed, 27 Feb
> 2002 version of glm.fit() reported on the R bug tracking site as
> "Models/1331".  (The change to glm() is still required as well of
> course.)  As an aside, I'm wondering if there has been any thought
> given to adopting Berwin's patches.  Though I must admit that I
> haven't taken the time to check carefully through his changes, I know
> Berwin to be a very careful and skillful programmer, and I suspect
> that he is the only person to have looked very closely at those
> particular details of glm.fit in recent months (years?).  Also, some
> of the simpler changes that he made seem to be needed just to cleanup
> the code.  Would it be useful if a number of us began using his
> version as an informal test?  After discovering this bug earlier
> today, I have already begun to do so and so far have not encountered
> any problems.

Yes, thought has been given, and the code has been looked at. (People
do look at R code you know, even if they don't jump in to change it
after a few hour's use.)

Given the time pressures, this is not going to get looked at for 1.5.0. I
have long thought that given the history of problems maintaining glm()
that it might be easier to rewrite it for scratch.  Having .null versions
serves no useful purpose, for example.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._