[R] Prediction plot for logistic regression output

Thu Sep 23 18:14:14 CEST 2010

On 23-Sep-10 15:08:06, Jay Vaughan wrote:
> How do I construct a figure showing predicted value plots for the
> dependent variable as a function of each explanatory variable
> (separately) using the results of a logistic regression? It would also
> be helpful to know how to show uncertainty in the prediction (95% CI or
> SE).
> 
> Thanks-

The basic tool is the predict() function -- see '?predict.glm'
for details of its implementation for logistic regression etc.

Your desire to "construct a figure showing predicted value plots
for the dependent variable as a function of each explanatory
variable (separately)" will somehow have to cope with the fact
that the predicted value will be a function of *all* of the
independent variables, not just the one you are using for the
plot. How you decide to cope with this will depend on the purpose
for which you want the plot.

One approach to this could be to hold all the other independent
variables constant (say at their mean values), while including
a sequance of values for the variable if interest, in the 'newdata'
which you submit to the predict() function.

Another approach, which has a number of problematic aspects,
is, for each value of the independent variable in the plot,
to use the data to estimate a suitable "representative value"
of each other independent variable, and build your 'newdata'
accordingly. In particular, this introduces the complication
that there will be uncertainty in such "representative values"
which will have to be incorporated into the uncertainty of
prediction of the dependent variable.

Whichever approach you adopt, it is important with logistic
regression to note that, while you may predict the response
(i.e. plot the probability of Y=1), the use of the SE returned
by predict() when 'type="response"' will in general give
nonsensical results (probabilities < 0 or > 1). The good
method is to use 'type="link"' and transform using the
logistic response function exp(x)/(1 + exp(x)), which will
always give non-nonsensical results. The following example
illustrates this.

First, a logistic regression is fitted. Then Prob(Y=1) is
obtained from the '$fit' component of predict(), using
'type="response", and plotted. So far so good. Then 95%
"confidence bounds" around this are obtained as +/- 1.96*SE,
where SE is the $se component of predict(). It can be seen
that these give bad results near the ends of the range (red curves).

Then an SE is obtained jusing predict() with 'type="link",
and the +/-95% limits on the link fuinction are obtained.
These are then transformed using the logistic response function,
and plotted. These can be seen to be sensible (green curves).

  lg <- function(x){ exp(x)/(1 + exp(x)) }
  set.seed(54321)
  X <- 0.3*(-15:15)
  P <- lg(X)
  Y <- 1*(runif(31) <= P)
  GLM <- glm(Y ~ X, family=binomial)
  prGLM <- predict(GLM,type="response",se=TRUE)
  plot(X,prGLM$fit,type="l",ylim=c(0,1),col="blue")
  lines(X,prGLM$fit+1.96*prGLM$se,col="red")
  lines(X,prGLM$fit-1.96*prGLM$se,col="red")
  prGLM <- predict(GLM,type="link",se=TRUE)
  lines(X,lg(prGLM$fit+1.96*prGLM$se),col="green")
  lines(X,lg(prGLM$fit-1.96*prGLM$se),col="green")

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Sep-10                                       Time: 17:14:11
------------------------------ XFMail ------------------------------