[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
Patrick Breheny
patrick.breheny at uky.edu
Fri Dec 2 15:08:18 CET 2011
On 12/01/2011 08:00 PM, Ben quant wrote:
> The data I am using is the last file called l_yx.RData at this link (the
> second file contains the plots from earlier):
> http://scientia.crescat.net/static/ben/
The logistic regression model you are fitting assumes a linear
relationship between x and the log odds of y; that does not seem to be
the case for your data. To illustrate:
x <- l_yx[,"x"]
y <- l_yx[,"y"]
ind1 <- x <= .002
ind2 <- (x > .002 & x <= .0065)
ind3 <- (x > .0065 & x <= .13)
ind4 <- (x > .0065 & x <= .13)
> summary(glm(y[ind1]~x[ind1],family=binomial))
...
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.79174 0.02633 -106.03 <2e-16 ***
x[ind1] 354.98852 22.78190 15.58 <2e-16 ***
> summary(glm(y[ind2]~x[ind2],family=binomial))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.15805 0.02966 -72.766 <2e-16 ***
x[ind2] -59.92934 6.51650 -9.197 <2e-16 ***
> summary(glm(y[ind3]~x[ind3],family=binomial))
...
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.367206 0.007781 -304.22 <2e-16 ***
x[ind3] 18.104314 0.346562 52.24 <2e-16 ***
> summary(glm(y[ind4]~x[ind4],family=binomial))
...
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.31511 0.08549 -15.383 <2e-16 ***
x[ind4] 0.06261 0.08784 0.713 0.476
To summarize, the relationship between x and the log odds of y appears
to vary dramatically in both magnitude and direction depending on which
interval of x's range we're looking at. Trying to summarize this
complicated pattern with a single line is leading to the fitted
probabilities near 0 and 1 you are observing (note that only 0.1% of the
data is in region 4 above, although region 4 accounts for 99.1% of the
range of x).
--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky
More information about the R-help
mailing list