[R] outliers/interval data extraction

Christian Hennig hennig at stat.math.ethz.ch
Fri Feb 21 10:10:03 CET 2003


Hi,

sorry, I was wrong and that's true. The Hampel
suggestion is
outliers <- (x<medx-3.5*madx) | (x>medx+3.5*madx)
or to use the multiplier 5.2 with
madx <- mad(x, constant=1).

Christian

On Fri, 21 Feb 2003, Jason Turner wrote:

> On Thu, Feb 20, 2003 at 06:54:21PM +0100, Christian Hennig wrote:
> ... 
> > However, a simple straight forward method for outlier identification is  
> > median +/- 5.2*mad as suggested by Hampel, Technometrics 27 (1985) 95-107.
> ...
> > x <- data vector
> > medx <- median(x)
> > madx <- mad(x)
> > outliers <- (x<medx-5.2*madx) | (x>medx+5.2*madx)
> > selected <- x[!outliers]
> 
> I haven't read the paper cited above, but I suspect the authors were
> talking about the true mad.  By default, R re-scales the mad to adjust
> for the normal case (ie multiplies by about 1.48).  If that's correct
> (and I'm quite happy to be wrong), this changes 5.2 to 3.5 in the
> example above.
> 
> Cheers
> 
> Jason
> 

-- 
***********************************************************************
Christian Hennig
Seminar fuer Statistik, ETH-Zentrum (LEO), CH-8092 Zuerich (currently)
and Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at stat.math.ethz.ch, http://stat.ethz.ch/~hennig/
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag.de




More information about the R-help mailing list