[R] Remove data 3 standard deviatons from the mean using R?

Berend Hasselman bhh at xs4all.nl
Tue Apr 9 15:25:33 CEST 2013


On 09-04-2013, at 13:12, Lorna <lornam at essex.ac.uk> wrote:

> Hi Everyone,
> 
> I have a very long list of data-points (+2300) and i know from my histogram
> that there are outliers which are affecting my mean.
> 
> I was wondering if anyone on here knows a way i can quickly get R to
> calculate and remove data which is 3 standard deviations from the mean? I am
> hoping this will tidy my data and give me a repeatable method of tidying for
> future data collection.
> 
> Please if you do post code, make it as user friendly as possible! I am not a
> very good programmer, i can load my data into R and do basic stats on it
> however i havent tried much else....


# some test data + standard deviation of same
testdata <- rnorm(100,0,5)
sd.td <- sd(testdata)

# threshold (set to 3.0 for your specific situation)
alpha <- 1.5

# determine which items fall within bounds and select them

pidx <- (testdata<mean(testdata)+alpha*sd.td) & (testdata>mean(testdata)-alpha*sd.td)
testdata[pidx]

Berend



More information about the R-help mailing list