[R] Testing for normality of residuals in a regression model

Spencer Graves spencer.graves at pdf.com
Fri Oct 15 20:18:26 CEST 2004


      OK, I'll expose myself: 

      I tend to do normal probability plots of residuals (usely deletion 
/ studentized residuals as described by Venables and Ripley in Modern 
Applied Statistics with S, 4th ed, MASS4).  If the plots look strange, I 
do something.  I'll check apparent outliers for coding and data entry 
errors, and I often delete those points from the analysis even if I 
can't find a reason why.  Robust regression will usually handle this 
type of problem, and I am gradually migrating to increasing use of 
robust regression, especially the procedures recommended by MASS4.  . 

      However, I recently encountered a situation that would be masked 
by standard use of robust regression without examining residual plots:  
A normal probability plot looked like three parallel straight lines with 
gaps, suggesting a mixture of 3 normal distributions with different 
means and a common standard deviation.  Further investigation revealed 
that an important 3-level explanatory variable that had been miscoded.  
When this was corrected, that variable entered the model and the gaps in 
the normal plot disappeared. 

      I tend NOT to use tests of normality for the reasons Andy 
mentioned.  Instead, I do various kinds of diagnostic plots and modify 
my model or investigate the data in response to what I see. 

      Comments?
      hope this helps.  spencer graves

Liaw, Andy wrote:

>Let's see if I can get my stat 101 straight:
>
>We learned that linear regression has a set of assumptions:
>
>1. Linearity of the relationship between X and y.
>2. Independence of errors.
>3. Homoscedasticity (equal error variance).
>4. Normality of errors.
>
>Now, we should ask:  Why are they needed?  Can we get away with less?  What
>if some of them are not met?
>
>It should be clear why we need #1.
>
>Without #2, I believe the least squares estimator is still unbias, but the
>usual estimate of SEs for the coefficients are wrong, so the t-tests are
>wrong.
>
>Without #3, the coefficients are, again, still unbiased, but not as
>efficient as can be.  Interval estimates for the prediction will surely be
>wrong.
>
>Without #4, well, it depends.  If the residual DF is sufficiently large, the
>t-tests are still valid because of CLT.  You do need normality if you have
>small residual DF.
>
>The problem with normality tests, I believe, is that they usually have
>fairly low power at small sample sizes, so that doesn't quite help.  There's
>no free lunch:  A normality test with good power will usually have good
>power against a fairly narrow class of alternatives, and almost no power
>against others (directional test).  How do you decide what to use?
>
>Has anyone seen a data set where the normality test on the residuals is
>crucial in coming up with appriate analysis?
>
>Cheers,
>Andy
>
>  
>
>>From: Federico Gherardini
>>
>>Berton Gunter wrote:
>>
>>    
>>
>>>>>Exactly! My point is that normality tests are useless for 
>>>>>          
>>>>>
>>this purpose for
>>    
>>
>>>>>reasons that are beyond what I can take up here. 
>>>>>
>>>>>          
>>>>>
>>Thanks for your suggestions, I undesrtand that! Could you 
>>possibly give 
>>me some (not too complicated!)
>>links so that I can investigate this matter further?
>>
>>Cheers,
>>
>>Federico
>>
>>    
>>
>>>>>Hints: Balanced designs are
>>>>>robust to non-normality; independence (especially 
>>>>>          
>>>>>
>>"clustering" of subjects
>>    
>>
>>>>>due to systematic effects), not normality is usually the 
>>>>>          
>>>>>
>>biggest real
>>    
>>
>>>>>statistical problem; hypothesis tests will always reject 
>>>>>          
>>>>>
>>when samples are
>>    
>>
>>>>>large -- so what!; "trust" refers to prediction validity 
>>>>>          
>>>>>
>>which has to do
>>    
>>
>>>>>with study design and the validity/representativeness of 
>>>>>          
>>>>>
>>the current data to
>>    
>>
>>>>>future. 
>>>>>
>>>>>I know that all the stats 101 tests say to test for 
>>>>>          
>>>>>
>>normality, but they're
>>    
>>
>>>>>full of baloney!
>>>>>
>>>>>Of course, this is "free" advice -- so caveat emptor!
>>>>>
>>>>>Cheers,
>>>>>Bert
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>
>>
>>    
>>
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>  
>

-- 
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567




More information about the R-help mailing list