[R] Outlier statistics question

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Tue Nov 30 22:05:10 CET 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Jahan
> Sent: Tuesday, November 30, 2010 12:16 PM
> To: r-help at r-project.org
> Subject: [R] Outlier statistics question
> 
> I have a statistical question.
> The data sets I am working with are right-skewed so I have been
> plotting the log transformations of my data.  I am using a Grubbs Test
> to detect outliers in the data, but I get different outcomes depending
> on whether I run the test on the original data or the log(data).  Here
> is one of the problematic sets:
> 
> fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)
> stripchart(fgf2p50,vertical=TRUE)
> #This next step requires you have the 'outliers' package
> library(outliers)
> grubbs.test(fgf2p50)
> #the output says p<0.05 so 5.047 is an outlier
> #Next, I run the test on the log(data)
> log10=c(0.194,0.335,0.403,0.436,0.388,0.703)
> grubbs.test(log10)
> #output is that p>0.05 so we reject that there is an outlier.
> 
> The question is, which outlier test do I accept?
> 

You may not want to "accept" either test.  What do YOU mean by an outlier, and why is it important for you to detect and handle "outliers" differently?  Maybe you should model the data so that the model correctly predicts or explains the so-called outlier.  So, what is it that you are wanting to do?

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204




More information about the R-help mailing list