[R] eliminating outliers

arun smartpink111 at yahoo.com
Mon Aug 5 21:37:03 CEST 2013


Dear Darrell,

Regarding the Error, I think it is the space issue.

dat1[c(TRUE,(diff(dat1$value)<-100)|(diff(dat1$value)>200)),]
Error in diff(dat1$value) <- 100 : could not find function "diff<-"
 res<-dat1[c(TRUE,(diff(dat1$value)< -100) | (diff(dat1$value)>200)),]
                                                            ^^
#or
dat1[c(TRUE,(diff(dat1$value)<(-100)) | (diff(dat1$value)>200)),]


?`<-`
#will assign a value to a name


res  

       date value
1  01/01/1947  2180
2  01/02/1947  1990
9  01/09/1947  1909
10 01/10/1947  1803
11 01/11/1947  2018
12 01/12/1947  2319
13 01/13/1947  1981
17 01/17/1947  2364
18 01/18/1947  1882
26 01/26/1947  1839
27 01/27/1947  2344
28 01/28/1947  2229
30 01/30/1947  1923
32 02/01/1947  2379




----- Original Message -----
From: "Bosch, Darrell" <bosch at vt.edu>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Monday, August 5, 2013 3:26 PM
Subject: RE: [R] eliminating outliers

Thanks, Arun.  When I entered the second line of code, 
dat1[c(TRUE,(diff(dat1$value)<-100)|(diff(dat1$value)>200)),]

I got the following
Error in diff(dat1$value) <- 100 : could not find function "diff<-"

Do I need to name 'dat1' as a time series dataset in order to invoke the difference operator?

I appreciate your help.

Darrell

Darrell Bosch
Professor
Department of Agricultural and Applied Economics
Virginia Tech
Blacksburg, VA 24061
tel. 540/231-5265
fax 540/231-7417



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of arun
Sent: Monday, August 05, 2013 12:07 PM
To: R help
Subject: Re: [R] eliminating outliers



HI,
Please use ?dput() to show a reproducible example.
set.seed(45)
dat1<- data.frame(date= format(seq(as.Date("01-01-1947",format="%m-%d-%Y"),as.Date("02-01-1947",format="%m-%d-%Y"),by=1),"%m/%d/%Y"),value=sample(1800:2400,32,replace=FALSE))

    

dat1[c(TRUE,(diff(dat1$value)< -100) | (diff(dat1$value)>200)),]


 which(!c(TRUE,(diff(dat1$value)< -100) | (diff(dat1$value)>200))) # [1]  3  4  5  6  7  8 14 15 16 19 20 21 22 23 24 25 29 31
 dat1[which(!c(TRUE,(diff(dat1$value)< -100) | (diff(dat1$value)>200))),] A.K.


I am reading a data file consisting of date and GDP as follows 

gdpdata <- read.table("C:/R-working/R-data/gdp-data-1947-87.txt", header=TRUE) 

Which results in
          date  value
1   01/01/1947 1932.6
2   04/01/1947 1930.4
Etc. 

I then first difference the data using the command 

diff(gdpdata$value) 

I would like to create a transformed dataset with outliers eliminated, i.e. any value of ‘diff’ that is greater than 200 or less than -100. 
Further, I would like R to tell me which dates and GDP values were eliminated. 

Any suggestions with how to do that would be appreciated. 

Thanks,
Darrell Bosch

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list