[R] warning in binomial analysis

John Maindonald john.maindonald at anu.edu.au
Mon Sep 23 02:19:46 CEST 2002


Surely any problem is with the fitted probabilities, not
with the coefficients.  The likely "problem" is that,
for one or more values eta(i) of the linear predictor,
pi(i) = exp(eta(i)/(1+exp(eta(i))) is calculated as 1.0,
with 1 - pi(i) the calculated as zero.  The problem can
be very substantially delayed by calculating

1 - pi(i) = 1/(1 + exp(eta(i)))

However, this will usually make no difference to the
answer that is obtained; the present "fix" has the
correct limiting behaviour that avoids the 0/0
calculation that would otherwise ensue. I suspect,
though that a discrete change in the deviance when
the present form of "fix" kicks in does sometimes
foul up the convergence test.  I'd be interested
to see an example of this, if someone thinks they
have one.  I suspect that it may have happened to
me in the past, but at the time I did not twig.

My guess is that the deviances and coefficients
are entirely ok.  I'd expect that problems in the
general area that Thomas mentions to reveal
themselves as a failure to converge.

The standard errors may well not be ok, but it is not
necessary to have fitted values that are (to within
machine precision) 0 or 1 to get meaningless SEs!
The asymptotic theory can start to creak well before
that point, and that is where the warnings are needed.

There are some further details in my article on
"Floating Point Arithmetic" in the Wiley Encyclopedia
of Biostatistics (2000).

Such occurrences are easy to induce when the cloglog
link is used.  Try:

p <- c(.45,.837,.977,.998,1,1)
x <- c(0,2,3,4,6,12)
n <- rep(1000,6)
glm(p~x, family=binomial(link=cloglog), weights=n)

Incidentally, I am happy to work on fixing this, but
I believe thewre is an issue about a major rewrite
of the glm engine - a discussion which is clearly
for R-devel.

John Maindonald

>On Fri, 20 Sep 2002, Ronaldo Reis Jr. wrote:
>
>>  Hi,
>>  I have make an analise with presence and absence, y=(1 e 0).
>>  I have a area continuous data and a sp data with 25 levels. I have 
>>300 points.
>>  When I make
>>
>>  glm((presenca/peso)~area,weights=peso,family=binomial,maxit=1000)
>>
>>  where
>>
>>  presenca is 0 or 1.
>>
>>  peso is the unit = 1.
>>
>>  area is the continuous data.
>>
>>  The analysis is OK.
>>
>>  When I put the sp and interactions in analysis this warning appear.
>>
>>  glm((presenca/peso)~area*sp,weights=peso,family=binomial,maxit=1000)
>>
>>  Warning message:
>>
>>  fitted probabilities numerically 0 or 1 occurred
>>
>>  in: (if (is.empty.model(mt)) glm.fit.null else
>>
>>  glm.fit)(x = X, y = Y,
>>
>>  Some ideas???
>
>When you fit logistic regression models to fairly sparse data you can
>often have a situation where for some combination of variables the
>response variable is either all 0 or all 1.  In that case the maximum
>likelihood estimates for at least some of the coefficients will be
>infinite.  That's what R is telling you.
>
>You should be able to tell which coefficients are infinite -- the
>coefficients and their standard errors will be large.
>
>When this happens the standard errors and the p-values reported by
>summary.glm() for those variables are useless.
>
>	-thomas
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list