[R] confidence interval or error of x intercept of a linear

Tue Mar 24 11:04:34 CET 2009

On 24-Mar-09 03:31:32, Kevin J Emerson wrote:
> Hello all,
> 
> This is something that I am sure has a really suave solution in R,
> but I can't quite figure out the best (or even a basic) way to do it.
> 
> I have a simple linear regression that is fit with lm for which I
> would like to estimate the x intercept with some measure of error
> around it (confidence interval). In biology, there is the concept
> of a developmental zero - a temperature under which development
> will not happen. This is often estimated by extrapolation of a
> curve of developmental rate as a function of temperature. This
> developmental zero is generally reported without error. I intend
> to change this! 
> There has to be some way to assign error to this term, I just have
> yet to figure it out.
> 
> Now, it is simple enough to calculate the x-intercept itself ( -
> intercept / slope ), but it is a whole separate process to generate
> the confidence interval of it.  I can't figure out how to propagate
> the error of the slope and intercept into the ratio of the two.
> The option option I have tried to figure out is to use the predict
> function to look for where the confidence intervals cross the axis
> but this hasn't been too fruitful either.  
> 
> I would greatly appreciate any insight you may be able to share.
> 
> Cheers,
> Kevin
> 
> Here is a small representative sample of some of my data where
> Dev.Rate ~ Temperature.

The difficulty with the situation you describe is that you are up
against what is known as the "Calibration Problem" in Statistics.
Essentially, the difficulty is that in the theory of the linear
regression there is sufficient probability for the slope to be
very close to zero -- and hence that the "X-intercept" maybe
very far out to the left or the right -- that the whole dconcept
of Standard error ceases to exist. Even the expected position of
the intercept does not exist. And this is true regardless of the
data (the one exception would be where it is absolutely certain
that the data will lie exactly on a straight line).

This has been addressed a number of times, and controversally,
in the statistical literature. One approach which I like (having
thought of it myself :)) is the use of "likelihood-based confidence
intervals".

Given a linear regression

  Y = a + b*X + E

(where the error E has N(0, sig^2) distribution), suppose you
want to estimate the value X0 of X for which Y = Y0, a given value
(in your case Y0 = 0). For simplicity, measure X and Y from their
means sum(X)/N and sum(Y)/N.

The approach is based on comparing the likelihoods

[A] for unresticted ML estimation of a, b and sig
    --> a.hat, b.hat, sig.hat
    (where a.hat = 0 because of the origins of X and Y)
[B] for ML estimation assuming a particular value X1 for X0
    --> a.tilde, b.tilde, sig.tilde

where
[A] a.hat     = 0 (as above),
    b.hat     = (sum(X*Y))/(sum(X^2)
    X0.hat    = Y0/b.hat [ = -mean(Y)/b.hat in your case ]
    sig.hat^2 = (sum(Y - b.hat*X)^2)/(N+1)

[B] a.tilde         = (sum(X^2) - X0*sum(X*Y))/D
    b.tilde         = ((N+1)*sum(X*Y) + N*X1*Y0)/D
    sig.hat.tilde^2
    = (sum((Y - a.tilde - b.tilde*X)^2)
        + (Y0 - a.tilde - b.tilde*X1)^2)/(N+1)

where D = (N+1)*sum(X^2 + N*X1^2

Then let R(X1) = (sig.hat^2)/(sig.tilde^2)

A confidence interval for X0 is the set of values of X1 such
that R(X1) > R0 say, where Prob(R(X0)>R0) = P, the desired
confidence level, when X0 is the true value.

It can be established that

  (N-2)*(1 - R(X0))/R(X0)

has the F distribution with 1 and (N-1) degrees of freedom.
Since this is independent of X0, the resulting set of values
of X1 constitute a confidence interval.

This was described in

  Harding, E.F. (1986). Modelling: the classical approach.
  The Statistician vol. 35, pp. 115-134
  (see pages 123-126)

and has been later referred to by P.J. Brown (thought I don't
have the references to hand just now).

When I have time for it (not today) I'll see if I can implement
this neatly in R. It's basically a question of solving

  (N-2)*(1 - R(X0))/R(X0) = qf(P,1,(N-1))

for X0 (two solutions, maybe one, if any exist). However, it is
quite possible that no solution exists (depending on P), so that
the confidence interval would then be (-inf,+inf); or, when only
one exists, it will be either (-inf,X0) or (X0,inf).
These possibilities of infinite intervals are directly related
to the possibility that, at significance level alpha = (1-P),
the estimated slope may not differ significantly from 0.

Maybe more later!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-Mar-09                                       Time: 10:04:30
------------------------------ XFMail ------------------------------