[R] dixon test

giov biowoman at libero.it
Thu Aug 14 13:59:29 CEST 2008


Steve,
thank you so much for very very useful helps and comments!!!!! 

for my research I used a surrogate approach (number of surrogates=100) and I
made a comparison among index values (very similar to a correlation index
computed with a second known signal ) from my original data and from
surrogates data , respectively. The idea is to find that the index value
from the original data is very "different" (in statistical sense)from those 
surrogates. To evaluate this difference I thought that the best thing to do 
it was the evaluation  if the index from original data can be considered an
outlier value in comparison with the . Is it a correct approach? 

thank you!!!!!!
giov




S Ellison wrote:
> 
> giov,
> 
> It sounds like you have approximately symmetric distributions. If that
> is so, and particularly if the standard deviation is less than about 20%
> of the mean, I'll stick my neck out and say I would assume underlying
> normality for outlier testing purposes unless there's a reason to do
> otherwise (eg if you're testing variances, normality would _not_ be a
> good assumption!).
> 
> The reason I'd do that is that is that it should not make a big
> difference to the outcome with near-symmetric distributions. If it does,
> your 'outliers' are borderline anyway. 
> Similarly, although folk can get quite exercised over which test to use
> and what significance level to choose, the test you use isn't very
> important either, as long as the intention is just to screen data to
> make sure the most influential/extreme points are not mistakes. 
> 
> Given that, you can use any of the tests in library(outliers). You can
> also use boxplot.stats, and look at the $out list, like
> 
> y<-c(rnorm(15,10), 25.1) #25.1 should be an outlier
> (bxs<-boxplot.stats(y))
> 
> #and locate the outliers in y:
> which(y %in% bxs$out)
> 
> Another useful approach is to use robust estimates of mean and
> dispersion, like hubers() in the MASS package, and then calculate simple
> scores, with a z-like cutoff to identify outliers:
> 
> require(MASS)
> hy<-hubers(y)
> hscore<-(y-hy$mu)/hy$s
> which(abs(hscore)>3)
> 
> Using the 'mad' or iqr options in outliers::scores will be broadly
> similar in outcome.
> 
> Most of the modelling tools in R also offer useful diagnostics for
> 'odd' points. I find examining the residuals from rlm in MASS
> particularly useful if you're seeking outliers in a regression context.
> 
> A more important question is what you will do if you find any outliers.
> Outliers are just unusual compared to some expectation, not
> automatically 'wrong'. Screening data for anomalies is good practice;
> checking them to make sure they aren't mistakes is to be encouraged;
> correcting mistakes if you find them is a no-brainer. But throwing
> outliers away is something to think about very carefully, and on a
> case-by-case basis. Sometimes, outliers are a genuine feature of the
> process under study, or even the 'interesting' parts of the data. It's
> generally unsafe to throw them out without good reason.
> 
> Steve E
> 
> 
> PS: Contrary to my earlier confident assertion of the non-existence of
> nonparametric outlier tests, Barnett and Lewis DOES include some general
> suggestions on 'nonparametric' outlier testing. But it also includes the
> note that this "... smacks of throwing out the bathwater before the baby
> has even been immersed". I guess they don't think much of the idea
> either.
> 
>>>> giov <biowoman at libero.it> 13/08/2008 15:21:25 >>>
> 
> Thank you so much, I have not much experience on outliers =), I thought
> that
> there were nonparametric distribution-free outliers test =(. What is
> the
> most general distribution  I can use? I did histogram of my data set
> and
> sometimes normal distribution seems to occur, sometimes an uniform
> distribution seems to occur. So, I cannot understand what distribution
> I can
> use for my whole data set....
> 
> 
> 
> 
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/dixon-test-tp18940260p18980152.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list