[R] "prediction intervals for glm"

Thu May 1 14:29:32 CEST 2003

On Thu, 1 May 2003, Fredrik Lundgren wrote:

> I wouldn't know anything about the theoretical problems with glm and a
> binary outcome but there is a "prediction interval" in predict.glm of
> S-Plus(6.02 version something). I have failed to source it to R (and I do
> have difficulties with the higher forms of matrix manipulations). In the
> medical field where I'm active I think it has a high value to generate
> "prediction intervals" for risk and benefit calculations for individual
> patients. If it's theoretically fishy or unsound with a prediction
> interval maybe some bootstrap appraoch could do the trick?

It's more than fishy ... it uses the normal approximation on link scale (as
I recall) which is very unlikely to be valid except for the gaussian
family.  Indeed for 0/1 data the interval will have coverage 0, exactly.

I don't see how a bootstrap would help either: the issue is to combine the
(reasonably well-known) uncertainty in the prediction of the mean with the
variability in the observation.  That would be easy to do by simulation,
but not by re-sampling.  (Or did you think all simulation-based inference
was `some bootstrap approach'.)  However, you are not going to be able to
summarize that predictive distribution as an *interval* for 0/1 data.

>
> Sincerely Fredrik Lundgren
> ----- Original Message -----
> From: "Peter Dalgaard BSA" <p.dalgaard at biostat.ku.dk>
> To: "Spencer Graves" <spencer.graves at pdf.com>
> Cc: "Fredrik Lundgren" <fredrik.lundgren at norrkoping.mail.telia.com>; <R-help at stat.math.ethz.ch>
> Sent: Tuesday, April 29, 2003 4:48 PM
> Subject: Re: [R] "prediction intervals for glm"
>
>
> > Spencer Graves <spencer.graves at pdf.com> writes:
> >
> > > "?predict.glm" produced something in my copy of R 1.6.2 under Windows
> > > 2000.
> >
> > .. but probably not what Fredrik wanted. Prediction intervals (i.e.
> > intervals with 95% probability of catching a new observation) are
> > somewhat tricky even to define for glms. For Normal responses you have
> > the formula yhat +- qt(.975,df)* sqrt(s^2+se(yhat)^2), for other
> > continuous responses that would become (approximately!) the error
> > distribution convolved with a Gaussian density, for discrete responses
> > - say 0/1 - I wouldn't know what to do.
> >
> > >
> > > Fredrik Lundgren wrote:
> >
> > > > Where can i find prediction intervals for glm in R?
> >
> >
> > --
> >    O__  ---- Peter Dalgaard             Blegdamsvej 3
> >   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
> >  (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595