[R] eliminating outliers

arun smartpink111 at yahoo.com
Mon Aug 5 18:07:13 CEST 2013



HI,
Please use ?dput() to show a reproducible example.
set.seed(45)
dat1<- data.frame(date= format(seq(as.Date("01-01-1947",format="%m-%d-%Y"),as.Date("02-01-1947",format="%m-%d-%Y"),by=1),"%m/%d/%Y"),value=sample(1800:2400,32,replace=FALSE))

    

dat1[c(TRUE,(diff(dat1$value)< -100) | (diff(dat1$value)>200)),]


 which(!c(TRUE,(diff(dat1$value)< -100) | (diff(dat1$value)>200)))
# [1]  3  4  5  6  7  8 14 15 16 19 20 21 22 23 24 25 29 31
 dat1[which(!c(TRUE,(diff(dat1$value)< -100) | (diff(dat1$value)>200))),]
A.K.


I am reading a data file consisting of date and GDP as follows 

gdpdata <- read.table("C:/R-working/R-data/gdp-data-1947-87.txt", header=TRUE) 

Which results in 
          date  value 
1   01/01/1947 1932.6 
2   04/01/1947 1930.4 
Etc. 

I then first difference the data using the command 

diff(gdpdata$value) 

I would like to create a transformed dataset with outliers 
eliminated, i.e. any value of ‘diff’ that is greater than 200 or less 
than -100. 
Further, I would like R to tell me which dates and GDP values were eliminated. 

Any suggestions with how to do that would be appreciated. 

Thanks, 
Darrell Bosch



More information about the R-help mailing list