[R] Confidence Intervals for logistic regression

(Ted Harding) Ted.Harding at manchester.ac.uk
Sat Aug 7 12:42:11 CEST 2010


On 07-Aug-10 09:29:41, Michael Bedward wrote:
> Thanks for that clarification Peter - much appreciated.
> 
> Is there an R function that you'd recommend for calculating
> more valid CIs ?
> Michael

It depends on what you want to mean by "more valid"! If you have
a 95% CI for the linear predictor (say L(x) at X=x), then the
probability that the CI will include the true value of L(x)
is 95% (more or less accurately, depending on what approximation,
if any, was used to obtain the CI). Thus, if A(Y) and B(Y) are the
lower and upper limits of a 95% CI for L(x) as functions of the data Y,

  P(A(Y) < L(x) < B(Y)) = 0.95 (to within approximation)

and this may be asymmetrical in that we may have
  P((A(Y) > L(x)) = 1 - P(B(Y) < L(x)) != 0.025
(e.g. it may come out as a 1%:4% split of the 5%).

The response probability P(Y=1 | X=x) will be a monotonic function
F(L(x)) of x -- e.g. for the logistic exp(L)/(1+exp(L)), increasing
from 0 to 1. Then {F(A(Y)), F(B(Y)} is a 95% CI for P = F(x),
since P[A(Y) < L(x) < B(Y)] = P[F(A(Y)) < F(L(x))=P(x) < F(B(Y))]. Also,
  P[A(Y) > L(x)] = P(F(A(Y)) > F(L(x) = P(x)] and
  P[B(Y) < L(x)] = P(F(B(Y)) < F(L(x) = P(x)]
for exactly the same reason (monotonicity of F). Hence the split
of the 5% between left tail and right tail ion the response scale
P(x) = F(L(x)) is exactly the same as the split on the linear
predictor scale L(x).

Therefore, on that front (comparison of probabilities of coverage),
the CI transformed to the response scale {F(A(Y)), F(B(Y)} is exactly
as valid as the CI {A(Y),B(Y)} on the original linear predictor scale.
In particular, if the latter is "equi-tailed" (2.5% on either side)
then the former will be too. If that is what you mean by "valid",
then you're finished.

However, possibly you may want "valid" to mean "extending to equal
distances on either side of the point estimate" -- e.g. as you
do with Estimate +/- 1.96*SE. It may be that, on the linear predictor
scale, you achieve this and also equi-tailed (2.5% either way).

But then, when you transform to the response scale, you will lose
that symmetry: F(Est - 1.96*SE) and F(Est + 1.96*SE) will not
be equidistant from F(Est) (though the equi-tailed 2.5%:2.5%
of the tail probability will be preserved).

If you have a reason for wanting to, you can start with a 95% CI
for L(x) which is not equi-tailed, but does have the property of
symmetry in the response scale: F(Est - 1.96*SE) and F(Est + 1.96*SE)
will be equidistant from F(Est). So you could set up the CI for
L(x) as {A(x) = Est - c0(x)*SE, B(x) = Est + c1(x)*SE} where c0(x)
and c1(x) (which in general depend on x) are chosen so that
you get symmetry on the response scale. But then you will lose
the equi-tailed property on the linear predictor scale, hence also
on the response scale.

So you can't have everything at once, and it depends on what you
want to mean by "valid"!

However, in the case of response being the probability of Y=1,
you might want to be careful about symmetry on the response
scale, since that could result in a CI which goes above 1 or
below 0, which would not be "valid" ...

For large samples, asymptotically all these issues tend to dwindle
into near-irrelevance, since locally the reponse is close to linear
and whatever you achieve on one scale will be (close to) achieved
on the other scale.

Hoping this helps,
Ted.







> On 7 August 2010 18:37, Peter Dalgaard <pdalgd at gmail.com> wrote:
>>
>> Probably, neither is optimal, although any transformed scale is
>> asymptotically equivalent. E.g., neither the probability scale
>> nor the logit scale stabilizes the variance of a simple proportion
>> (the arcsine transform does), so test-based CIs should really be
>> asymmetric in both cases rather than just +/- 1.96se.
>>
>> However, working on the linear predictor scale has the advantage
>> that CIs by definition will not cross the boundaries of the
>> parameter space. (For the "usual" link functions: logit, probit,
>> cloglog, that is; it's not true for the identity link, obviously.)
>> --
>> Peter Dalgaard
>> Center for Statistics, Copenhagen Business School
>> Phone: (+45)38153501
>> Email: pd.mes at cbs.dk _Priv: PDalgd at gmail.com

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-10                                       Time: 11:42:09
------------------------------ XFMail ------------------------------



More information about the R-help mailing list