[R] Outlier removal techniques

S Ellison S.Ellison at LGCGroup.com
Fri Feb 10 18:15:14 CET 2012


 

> -----Original Message-----
> I wonder why it is still standard practice in some circles to 
> search for "outliers" as opposed to using robust/resistent methods.  

At the risk of extending an old debate and driving us off list topic, here are three possible reasons:
i) Identifying outliers is important when you want to find possible mistakes in measurement or data entry - so irrespective of whether you use robust methods, you probably want to ask questions like 'why has that result been entered as almost exactly 1000 times the value I expected?' [typically a unit error, btw). And although graphical outlier checking is the obvious way to do that, eyeballs see oddity in chance; an outlier test can help you distinguish oddity from chance and save some (arguably) unnecessary follow-up. 

ii) because supervised outlier rejection at around the 99% level performs - for simple problems - about as well as Huber's with c set to 1.5 and is a lot easier to explain to, er, people who don't understand iterative numerical methods.

iii) Because it's written into some international Standards for statistical processing of data (ie, it's standard practice because it's Standard practice).

iv) because you can't do robust analysis in Excel* 

Not that all these are necesarily _good_ reasons ... ;-)

However, I do NOT understand why schools in the UK teach physics students that outliers should automatically and always be thrown out; that's a much larger leap.

*You can actually; with R or several add-ins. But that is off topic.
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}



More information about the R-help mailing list