[R] Pointwise Confidence Bounds on Logistic Regression

Fri Jun 20 08:52:17 CEST 2008

On Thu, 19 Jun 2008, Bryan Hanson wrote:

> Thanks so much to all who offered assistance.  I have to say it would have
> taken me a long time to figure this out, so I am most grateful.  Plus,
> studying your examples greatly improves my understanding.
>
> As a follow up, the fit process gives the following error:
>
> Warning messages:
> 1: In eval(expr, envir, enclos) :
>  non-integer #successes in a binomial glm!
>
> Is this something I should worry about?  It doesn't arise from glm or
> glm.fit

It does arise from glm(), which runs the initialization code for the 
family.  Specifically

> binomial()$initialize
expression({
     if (NCOL(y) == 1) {
         if (is.factor(y))
             y <- y != levels(y)[1]
         n <- rep.int(1, nobs)
         if (any(y < 0 | y > 1))
             stop("y values must be 0 <= y <= 1")
         mustart <- (weights * y + 0.5)/(weights + 1)
         m <- weights * y
         if (any(abs(m - round(m)) > 0.001))
             warning("non-integer #successes in a binomial glm!")
     }
     else if (NCOL(y) == 2) {
         if (any(abs(y - round(y)) > 0.001))
             warning("non-integer counts in a binomial glm!")
         n <- y[, 1] + y[, 2]
         y <- ifelse(n == 0, 0, y[, 1]/n)
         weights <- weights * n
         mustart <- (n * y + 0.5)/(n + 1)
     }
     else stop("for the binomial family, y must be a vector of 0 and 1's\n",
         "or a 2 column matrix where col 1 is no. successes and col 2 is 
no. failures")
})

You specified a binomial family with responses 0.1 and 0.9.  That makes no 
sense -- the binomial distribution is for integers.  Perhaps you mean 1/10 
and 9/10, in which case you needed a weight of 10, but I'm forced to 
guess.

>
> Thanks, Bryan
>
> For the record, a final complete working code is appended below.
>
> x <- c(1:40)
> y1 <- c(rep(0.1,10), rep(NA, 10), rep(0.9,20))
> y2 <- c(rep(0.1,15), rep(NA, 10), rep(0.9,15))
> data <- as.data.frame(cbind(x,y1,y2))
> plot(x, y1, pch = 1, ylim = c(0,1), col = "red", main = "Logistic Regression
> w/Confidence Bounds", ylab = "y values",  xlab = "x values")
> points(x, y2, pch = 3, col = "blue")
> abline(h = 0.5, col = "gray")
> fit1 <- glm(y1~x, family = binomial(link = "logit"), data, na.action =
> na.omit)
> fit2 <- glm(y2~x, family = binomial(link = "logit"), data, na.action =
> na.omit)
> lines(fit1$model$x, fit1$fitted.values, col = "red")
> lines(fit2$model$x, fit2$fitted.values, col = "blue")
>
> ## predictions on scale of link function
> pred1 <- predict(fit1, se.fit = TRUE)
> pred2 <- predict(fit2, se.fit = TRUE)
>
> ## constant for 95% confidence bands
> ## getting two t values is redundant here as fit1 and fit2
> ## have same residual df, but the real world may be different
> res.df <- c(fit1$df.residual, fit2$df.residual)
> ## 0.975 because we want 2.5% in upper and lower tail
> const <- qt(0.975, df = res.df)
>
> ## confidence bands on scale of link function
> upper1 <- pred1$fit + (const[1] * pred1$se.fit)
> lower1 <- pred1$fit - (const[1] * pred1$se.fit)
> upper2 <- pred2$fit + (const[2] * pred2$se.fit)
> lower2 <- pred2$fit - (const[2] * pred2$se.fit)
>
> ## bind together into a data frame
> bands <- data.frame(upper1, lower1, upper2, lower2)
>
> ## transform on to scale of response
> bands <- data.frame(lapply(bands, binomial(link = "logit")$linkinv))
>
> ## plot confidence bands
> lines(fit1$model$x, bands$upper1, col = "pink")
> lines(fit1$model$x, bands$lower1, col = "pink")
> lines(fit2$model$x, bands$upper2, col = "lightblue")
> lines(fit2$model$x, bands$lower2, col = "lightblue")
>
>
>
> On 6/19/08 12:28 PM, "Gavin Simpson" <gavin.simpson at ucl.ac.uk> wrote:
>
>> On Thu, 2008-06-19 at 10:42 -0400, Bryan Hanson wrote:
>>> [I've ommitted some of the conversation so far...]
>>>
>>>> E.g. in a logistic model, with (say) eta = beta_0 + beta_1*x one may
>>>> find, on the
>>>> linear predictor scale, A and B (say) such that P(A <= eta <= B) = 0.95.
>>>>
>>>> Then P(expit(A) <= expit(eta) <= expit(B)) = 0.95, which is exactly
>>>> what is wanted.
>>>
>>> I think I follow the above conceptually, but I don't know how to implement
>>> it, though I fooled around (unsuccessfully) with some of the variations on
>>> predict().
>>>
>>> I'm trying to learn this in response to a biology colleague who did
>>> something similar in SigmaPlot. I can already tell that SigmaPlot did a lot
>>> of stuff for him in the background.  The responses are 0/1 of a particular
>>> observation by date.  The following code simulates what's going on (note
>>> that I didn't use 0/1 since this leads to a recognized condition/warning of
>>> fitting 1's and 0's. I've requested Brian's Pattern Recognition book so I
>>> know what the problem is and how to solve it).  My colleague is looking at
>>> two populations in which the "LD50" would differ.  I'd like to be able to
>>> put the "pointwise confidence bounds" on each curve so he can tell if the
>>> populations are really different.
>>>
>>> By the way, this code does give a (minor?) error from glm (which you will
>>> see).
>>>
>>> Can you make a suggestion about how to get those "confidence bounds" on
>>> there?
>>>
>>> Also, is a probit link more appropriate here?
>>>
>>> Thanks,  Bryan
>>>
>>> x <- c(1:40)
>>> y1 <- c(rep(0.1,10), rep(NA, 10), rep(0.9,20))
>>> y2 <- c(rep(0.1,15), rep(NA, 10), rep(0.9,15))
>>> data <- as.data.frame(cbind(x,y1,y2))
>>> plot(x, y1, pch = 1, ylim = c(0,1), col = "red")
>>> points(x, y2, pch = 3, col = "blue")
>>> abline(h = 0.5, col = "gray")
>>> fit1 <- glm(y1~x, family = binomial(link = "logit"), data, na.action =
>>> na.omit)
>>> fit2 <- glm(y2~x, family = binomial(link = "logit"), data, na.action =
>>> na.omit)
>>> lines(fit1$model$x, fit1$fitted.values, col = "red")
>>> lines(fit2$model$x, fit2$fitted.values, col = "blue")
>>
>> The point is to get predictions on the scale of the link function,
>> generate 95% confidence bands in the normal way and then transform the
>> confidence bands onto the scale of the response using the inverse of the
>> link function used to fit the model.
>>
>> [note, am doing this from memory, so best to check this is right -- I'm
>> sure someone will tell me very quickly if I have gone wrong anywhere!]
>>
>> ## predictions on scale of link function
>> pred1 <- predict(fit1, se.fit = TRUE)
>> pred2 <- predict(fit2, se.fit = TRUE)
>>
>> ## constant for 95% confidence bands
>> ## getting two t values is redundant here as fit1 and fit2
>> ## have same residual df, but the real world may be different
>> res.df <- c(fit1$df.residual, fit2$df.residual)
>> ## 0.975 because we want 2.5% in upper and lower tail
>> const <- qt(0.975, df = res.df)
>>
>> ## confidence bands on scale of link function
>> upper1 <- pred1$fit + (const[1] * pred1$se.fit)
>> lower1 <- pred1$fit - (const[1] * pred1$se.fit)
>> upper2 <- pred2$fit + (const[2] * pred2$se.fit)
>> lower2 <- pred2$fit - (const[2] * pred2$se.fit)
>>
>> ## bind together into a data frame
>> bands <- data.frame(upper1, lower1, upper2, lower2)
>>
>> ## transform on to scale of response
>> bands <- data.frame(lapply(bands, binomial(link = "logit")$linkinv))
>>
>> ## plot confidence bands
>> lines(fit1$model$x, bands$upper1, col = "red", lty = "dotted")
>> lines(fit1$model$x, bands$lower1, col = "red", lty = "dotted")
>> lines(fit2$model$x, bands$upper2, col = "blue", lty = "dotted")
>> lines(fit2$model$x, bands$lower2, col = "blue", lty = "dotted")
>>
>> HTH
>>
>> G
>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595