[R] se.fit in predict.glm

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Wed Apr 28 22:32:11 CEST 2004

On 27-Apr-04 Peter Dalgaard wrote:
> (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:
>> The documentation does not say definitely what p$se.fit is,
>> only calling it "Estimated standard errors". I *believe*
>> this means, at each value of X, the SE in the estimation
>> of P[y=1] taking account of the joint uncertainty in the
>> estimation of 'a' and 'b' in the relation
>>   probit(P) = a + b*X
>> Can someone confirm that this really is so?
> Pretty accurate, I'd say. 
> Basically, the fitted value is a function of the estimated parameters.
> Asymptotically, the latter are approximately normally distributed with
> a small dispersion so that the function is effectively linear and you
> can approximate the distribution of the fitted value with a normal
> distribution.

This is a bit of a minefield! I've been fitting a binomial regression
  g <- glm(y~x, family=binomial(link=probit))
followed by a predicition
  p <- predict.glm(g, newx, type="response", se.fit=TRUE)

Plotting the fit p$fit and the inplied "confidence bands"
  p$fit +- 2*p$se.fit
gave rather narrow (I thought) bands for the prediction, so I did
a simulation of 20 mvrnorm draws from the joint distribution of
a and b (using their SDs and correlation from summary.glm).

I then plotted the corresponding curves pnorm(a + b*x) and got
a set of 20 curves about half of which lay well outside the
above "condfidence band", some quite a long way off. (This, I must
say, is what I had intuitively been expecting to find in the first

I then turned to the V&R book, and happened on the section "Problems
with binomial GLMs" in 7.2; and discovered "profile".

So this led me to "confint" in MASS via ?profile and ?profile.glm.

The results of
gave me a and b for the lower (2.5%) and upper (97.5%) curve.
When plotted, these curves lay well outside the "confidence
bands" obtained from "predict.glm" and were much more realistically
related to my simulations (19 of my simulated curves nicely packed
the space between tthe two "confint" curves, and one lay just
outside -- couldn't have hoped for a result more close to expectation!).

Nevertheless, I don't think my data were all that few or nasty:

23 x-values, roughly equally spaced, with about 12 0/1 results
at each, and numbers of responses going up as
  0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 2 4 5 8 8 9 4 10

So I tend to conclude that the "predict.glm(...,se.fit=TRUE)"
method should perhaps be avoided in favour of using "confint",
though I see no indication that "confint" respects the covariance
of the parameter estimates (intercept and slope) whereas the
"predict" method in theory does.

Maybe I'll have another go, after centering the x-values at their
mean ...

Anyway, comments would be appreciated!

Best wishe to all,

E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 28-Apr-04                                       Time: 21:32:11
------------------------------ XFMail ------------------------------

More information about the R-help mailing list