[R] outlier detection methods in r?

Mon Apr 24 18:07:22 CEST 2000

>Subject: [R] outlier detection methods in r?
> hi -
>  if I sample from a normal distribution with something like
> n100<-rnorm(100,0,1)
> and add an outlier with
> n100[10]<-4
> then
> qqnorm(n100)
> visually shows the point 4 as an outlier
> and calculating the probablity of a value of 4 or bigger  in 100 samples
of norm(0,1)
> gives
> > 1-exp(log(pnorm(4,0,1))*100)
> [1] 0.003162164
>
> If I have more than 1 sample above outlier threshold the math is a bit
more complicated
> but doable.
 > My questions are
> 1) are there better ways to assess probablity of outliers (ie value(s)
above theshold from a given distribution)?
> 2) are they implimented in r?

1)
The term "a given distribution" makes things a lot difficult, a far as
outlier detection is concerned.
If we are talking about normal distributions, or multivariate normal
distributions, the method based on Mahalanobis distances is the one I
prefer.
If the sample comes from a normal distribution, its Mahalanobis distance
follows a chi-square distribution, so you can allways assess if certain
point is above the threshold determined by your significance level.

2)
You can find mahalanobis() in base package.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._