[Rd] Tolerances in glm.control

Fri May 9 16:56:15 MEST 2003

On Fri, 9 May 2003, Martin Maechler wrote:

> >>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
> >>>>>     on Fri, 9 May 2003 12:08:10 +0100 (BST) writes:
> 
>     BDR> I have tightened the tolerances in glm.control in
>     BDR> R-devel (aka 1.8.0 Under Development) from epsilon =
>     BDR> 1e-4 to 1e-8, and increases maxit from 10 to 25.
> 
>     BDR> Normally the effect is to do one more iteration and get
>     BDR> more accurate results.  However, in cases of partial
>     BDR> separation several more iterations will be done and it
>     BDR> will be clearer from the results which parameters are
>     BDR> converging and which are diverging (to +/-Inf).
> 
>     BDR> I have been meaning to do this for some time (the
>     BDR> defaults seemed appropriate to computer speeds at the
>     BDR> 1991 origin of glm in S3), but have only this time
>     BDR> remembered at the beginning of a release cycle.
> 
> Very good!
> 
> Being on this topic:  What about 
> enhancing  summary.glm() and changing print.summary.glm() quite
> a bit such that
> 
> o   summary.glm() for back compatibility still computes the Wald tests
> 		  but also does all the "drop1" tests for each coefficient
> and
> 
> o  print.summary.glm() would print these tests instead of the 
> 		       (too often misleading) Wald ones.

I think it is too slow for routine work: people fit glm's with lots
of parameters, often 100s if they are surrogate multinomial models.

> Related: How hard would it be to compute a ``Hauck-Donner diagnostic''
> 	 warning?
> 	 (If it's an open research problem then it's "too hard"
> 	  for R 1.8 :-)

I think there are two issues here

1) People fit partially separable models, for which the MLEs have some
infinite coefficients, but don't get near convergence.

2) There are models with very large effects where some betas are large but 
finite and Wald tests are unreliable (and yes, Thomas, as happened in the 
last 24 hours people do test for the parameter being zero via the t-test: 
which is why I think S-PLUS got it right in not giving p-values).

It has been known since the 1970s (at least) how to detect 1 via linear 
program, and there are interative methods from the 1950/60s (`the 
perceptron').  It ought to be possible to detect by a better convergnce 
diagnostic for glm() too.   I do not know how to detect 2.

Brian

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595