[R] R code for to check outliers
Martin Maechler
maechler at stat.math.ethz.ch
Wed Jul 18 18:51:53 CEST 2012
>>>>> Bert Gunter <gunter.berton at gene.com>
>>>>> on Wed, 18 Jul 2012 07:14:31 -0700 writes:
> checkforoutliers <- function(series) NULL
> Cheers, Bert
> *Explanation: There is no such thing as a statistical
> outlier -- or, rather,"outlier" is a fraudulent
> statistical concept, defined arbitrarily and without
> scientific legitimacy. The typical unstated purpose of
> such identification is to remove contaminating or
> irrelevant data, but such a judgment can only be made by a
> subject matter expert with knowledge of the context and,
> usually, the specific cause for the unusual data. Do not
> be misled by the large body of statistical literature on
> this topic into believing that statistical analysis alone
> can provide objective criteria to do this. That is a path
> to scientific purgatory.
> For the record: 1. I am a statistician
> 2. Lots of highly knowledgeable, smart statisticians will condemn what I
> have just said as stupid ranting.
I entirely agree with you that outlier-removing
procedures are mostly misused, and dangerous because of that
misuse {and hence should typically NOT be taught, or not the way
I have seen them taught (on occasions, not here at ETH!)...}
and I even more fervently agree with Michael Weylandt's
recommendation to use robust statistics rather than outlier
detection --- at least in those cases where "robust statistics"
is *not* ill-re-defined as {outlier detection}+{classical stats}.
However, I don't think 'outlier' to be a fraudulent concept.
Rather I think outliers can be pretty well defined along the
line of "outlier WITH RESPECT TO A MODEL"
(and 'model' means 'statistical model', i.e., with some
randomness built in) :
Outlier wrt model M :=
an observation which is highly
improbable to be observed under model M
(and "highly improbable" of course is somewhat vague, but that's
not a problem per se.)
A version of the above is
Outlier := an observation that has unduely large influence on
the estimators/inference performed
where 'estimator / inference' imply a model of course.
So I think outlier is a useful concept for those who think about
*models* (rather than just data sets), and I agree that without
an implicit or explicit model, "outlier" is not well defined.
> The perils of a mailing list.
> -- Bert
:-)
Martin
> On Wed, Jul 18, 2012 at 6:27 AM, Sajeeka Nanayakkara .. wrote:
>>
>> What is the R code to check whether data series have
>> outliers or not?
>>
>> Thanks,
>>
>> Sajeeka Nanayakkara
> --
> Bert Gunter Genentech Nonclinical Biostatistics
More information about the R-help
mailing list