[R] binomial glm warnings revisited

Wed Oct 8 21:40:12 CEST 2003

Spencer Graves <spencer.graves at pdf.com> writes:

>       This seems to me to be a special case of the general problem of
> a parameter on a boundary.  

Umm, no... 

> >I have this problem with my data. In a GLM, I have 269 zeroes and
> >only 1 one:

I don't think that necessarily gets you a parameter estimate on the
boundary. Only if the single "1" is smaller or bigger than all the others
should that happen. 

> >summary(dbh)
> >Coefficients:
> >            Estimate Std. Error z value Pr(>|z|)
> >(Intercept)   0.1659     3.8781   0.043    0.966
> >dbh          -0.5872     0.5320  -1.104    0.270
> >
> >
> >>drop1(dbh, test = "Chisq")
> >>
> >Single term deletions
> >Model:
> >MPext ~ dbh
> >       Df Deviance     AIC     LRT Pr(Chi)  <none>      9.9168
> > 13.9168                  dbh     1  13.1931 15.1931  3.2763 0.07029 .
> >
> >I now wonder, is the drop1() function output 'reliable'?
> >
> >If so, is then the estimates from MASS confint() also 'reliable'? It gives
> >the same warning.

> >(Intercept) -6.503472 -0.77470556
> >abund       -1.962549 -0.07496205
> >There were 20 warnings (use warnings() to see them)

During profiling, you may be pushing one of the parameter near the
extremes and get a model where the fitted p's are very close to 0/1.
That's not necessarily a sign of unreliability -- the procedure is to
set one parameter to a sequence of fixed values and optimize over the
other, and it might just be the case that the optimizations have been
wandering a bit far from the optimum. (I'd actually be more suspicious
about the fact that the name of the predictor suddenly changed....)

However, if you have only one "1" you are effectively asking whether
one observation has a different mean than the other 269, and you have
to consider the sensitivity to the distribution of the predictor. As
far as I can see, you end up with the test of the null hypothesis
beta==0 being essentially equivalent to a two sample t test between
the mean of the "0" group and that of the "1" group, so with only one
observation in one of the groups, the normal approximation of the test
hinges quite strongly on a normal distribution of the predictor
itself.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907