[R] Testing for normality of residuals in a regression model

Fri Oct 15 20:08:05 CEST 2004

On Fri, 15 Oct 2004, Liaw, Andy wrote:

> Let's see if I can get my stat 101 straight:
> 
> We learned that linear regression has a set of assumptions:
> 
> 1. Linearity of the relationship between X and y.
> 2. Independence of errors.
> 3. Homoscedasticity (equal error variance).
> 4. Normality of errors.
> 
> Now, we should ask:  Why are they needed?  Can we get away with less?  What
> if some of them are not met?
> 
> It should be clear why we need #1.
> 
> Without #2, I believe the least squares estimator is still unbias, but the
> usual estimate of SEs for the coefficients are wrong, so the t-tests are
> wrong.
> 
> Without #3, the coefficients are, again, still unbiased, but not as
> efficient as can be.  Interval estimates for the prediction will surely be
> wrong.

The lost of efficiency is often quite small.

> Without #4, well, it depends.  If the residual DF is sufficiently large, the
> t-tests are still valid because of CLT.  You do need normality if you have
> small residual DF.

However, stats 901 or some such tells you that if the distributions have 
even slightly longer tails than the normal you can get much better 
estimates than OLS, and this happens even before a test of normality 
rejects on a sample size of thousands.

Robustness of efficiency is much more important than robustness of 
distribution, and I believe robustness concepts should be in stats 101.
(I was teaching them yesterday in the third lecture of a basic course, 
albeit a graduate course.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595