[R] detection of outliers

Christian Hennig fm3a004 at math.uni-hamburg.de
Thu Sep 23 17:14:24 CEST 2004


On Thu, 23 Sep 2004 Phguardiol at aol.com wrote:

> Hi,
> this is both a statistical and a R question...
> what would the best way / test to detect an outlier value among a series of 10 to 30 values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as an outlier (only one direction). One way would be to calculate abs(mean - median) and if elevated (to what extent ?) delete the extreme data then redo.. but is it valid to do so with so few data ? is the (trimmed mean - mean) more efficient ? if so, what would be the maximal tolerable value to use as a threshold ? (I guess it will be experiment dependent...) tests for skweness will probably required a larger dataset ? 
> any suggestions are very welcome !
> thanks for your help
> Philippe Guardiola, MD
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

You may want to read 
Davies and Gather, The identification of multiple outliers, JASA 88 (1993),
782-801.

The simplest recommendation is to nominate all points with distance larger
than c*mad(data) from the median as outliers. Choices of c depending on n
are given in the above paper.

This is somewhat better founded theoretically than the boxplot method
recommended by Gabor G., but it is based on the assumption that the
distribution on the non-outliers is close to the normal and especially not
strongly skewed (the boxplot method
seems to be a bit more robust against skewness).

Christian
 
***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de




More information about the R-help mailing list