[R] pros and cons of "robust regression"? (i.e. rlm vs lm)

Thu Apr 6 18:56:28 CEST 2006

To add to Bert's comments:

-  "Normalizing" data (e.g., subtracting mean and dividing by SD) can help
numerical stability of the computation, but that's mostly unnecessary with
modern hardware.  As Bert said, that has nothing to do with robustness.

-  Instead of _replacing_ lm() with rlm() or other robust procedure, I'd do
both of them.  Some scientists view robust procedures that omit some data
points (e.g., by assigning basically 0 weight to them) in automatic fashion
and just trust the result as bad science, and I think they have a point.
Use of robust procedure does not free one from examining the data carefully
and looking at diagnostics.  Careful treatment of outliers is esspecially
important, I think, for data coming from a confirmatory experiment.  If the
conclusion you draw depends on downweighting or omitting certain data
points, you ought to have very good reason for doing so.  I think it can not
be over-emphasized how important it is not to take outlier deletion lightly.
I've seen many cases that what seems like outlier originally turned out to
be legitimate data, and omission of them just lead to overly optimistic
assessment of variability.

Andy

From: Berton Gunter
> 
> There is a **Huge** literature on robust regression, 
> including many books that you can search on at e.g. Amazon. I 
> think it fair to say that we have known since at least the 
> 1970's that practically any robust downweighting procedure 
> (see, e.g "M-estimation") is preferable (more efficient, 
> better continuity properties, better estimates) to trimming 
> "outliers" defined by arbitrary threshholds. An excellent but 
> now probably dated introductory discussion can be found in 
> "UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS" edited 
> by Hoaglin, Tukey, Mosteller, et. al.
> 
> The rub in all this is that nice small sample inference 
> results go our the window, though bootstrapping can help with 
> this. Nevertheless, for a variety of reasons, my 
> recommendation is simply to **never** use lm and **always** 
> use rlm (with maybe a few minor caveats). Many would disagree 
> with this, however.
> 
> I don't think "normalizing" data as it's conventionally used 
> has anything to do with robust regression, btw.
> 
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
>  
> "The business of the statistician is to catalyze the 
> scientific learning process."  - George E. P. Box
>  
>  
> 
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of r user
> > Sent: Thursday, April 06, 2006 8:51 AM
> > To: rhelp
> > Subject: [R] pros and cons of "robust regression"? (i.e. rlm vs lm)
> > 
> > Can anyone comment or point me to a discussion of the
> > pros and cons of robust regressions, vs. a more
> > "manual" approach to trimming outliers and/or
> > "normalizing" data used in regression analysis?
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>