[R] I think you misunderstood my explantation.

arun smartpink111 at yahoo.com
Wed Jan 30 12:53:45 CET 2013


Hi,

Your dataset had already some missing values.  So, I need to subset only those rows that are not missing.
!is.na(temp$ACTIVE_KWH)
# [1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
#[13]  TRUE  TRUE  TRUE
temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)]
#[1] 1201.9 1202.2 1202.8 1203.9   12.0 1206.0 1206.3 1206.5 1207.3 1207.9
#[11] 1208.4

?diff() will get the differences between successive values
diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])
 #[1]     0.3     0.6     1.1 -1191.9  1194.0     0.3     0.2     0.8     0.6
#[10]     0.5

#Here, the length is 1 less than the previous case as the first value is removed.
 diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])<0
# [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

#Added `FALSE` at the beginning to make the length equal to subset data
indx<- c(FALSE,diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])<0)
indx 
#[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

#Using this index, further subset the already subset data for differences of values <0
 temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][indx]
#[1] 12
temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][indx]<- NA #changed to NA

#Similarly for REACTIVE_KWH
Hope this helps.
A.K.









________________________________
From: 남윤주 <jamansymptom at naver.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Wednesday, January 30, 2013 12:51 AM
Subject: Re: I think you misunderstood my explantation.


Oh, I forgot to ask about those code.
Can u expain what dose that mean?

Using the first dataset temp:
temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][c(FALSE,diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])< 0)]<-NA
temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)][c(FALSE,diff(temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)])< 0)]<-NA 
-----Original Message-----
From: "arun"<smartpink111 at yahoo.com> 
To: "남윤주"<jamansymptom at naver.com>; 
Cc: "R help"<r-help at r-project.org>; 
Sent: 2013-01-30 (수) 10:37:18
Subject: Re: I think you misunderstood my explantation.

Hi,
Sorry, I didn't check your codes previously.

I hope this works for you (especially the <0).
Using the first dataset temp:
temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][c(FALSE,diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])< 0)]<-NA
temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)][c(FALSE,diff(temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)])< 0)]<-NA
temp
#      ID        CTIME ACTIVE_KWH REACTIVE_KWH
#1  HM001 201212121301     1201.9       1115.5
#2  HM001 201212121302     1202.2       1115.8
#3  HM001 201212121303     1202.8       1115.8
#4  HM001 201212121304         NA       1116.1
#5  HM001 201212121305     1203.9       1116.7
#6  HM001 201212121306         NA       1116.7
#7  HM001 201212121307         NA       1116.7
#8  HM001 201212121308         NA           NA
#9  HM001 201212121309     1206.0       1118.2
#10 HM001 201212121310     1206.3       1118.6
#11 HM001 201212121311     1206.5       1118.8
#12 HM001 201212121312         NA           NA
#13 HM001 201212121313     1207.3           NA
#14 HM001 201212121314     1207.9       1121.1
#15 HM001 201212121315     1208.4       1121.3
temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)][c(FALSE,diff(temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)])< 0)]<-NA

#Similarly with the second dataset:
temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)][c(FALSE,diff(temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)])< 0)]<-NA
temp1$REACTIVE_KWH[!is.na(temp1$REACTIVE_KWH)][c(FALSE,diff(temp1$REACTIVE_KWH[!is.na(temp1$REACTIVE_KWH)])< 0)]<-NA


A.K.






________________________________
From: 남윤주 <jamansymptom>@naver.com>
To: arun <smartpink111>@yahoo.com> 
Sent: Tuesday, January 29, 2013 7:42 PM
Subject: I think you misunderstood my explantation.


Hi,

Assume that first CTIME value is '201201010000'. It means ACTIVE_KWH measured from  '201201010000' to present.
show example below row.

1  HM001 201212121301 1201.9 1115.5

1 row's  ACTIVE_KWH   accumulated value that measured from '201201010000' to '201212121301'.
when CTIME is '201212121301',  ACTIVE_KWH  is '1201.9'.  And, when  CTIME is  '201212121302', ACTIVE_KWH  is '1202.2'.
It means that 0.3 is measured during 1 minute.  And  ACTIVE_KWH  is a accumulated value.
Thus, ACTIVE_KWH  must increase, as CTIME  increases.
You got it?  So, I have to define strange value as subtraction value like ( temp$ACTIVE_KWH[i] -  temp$ACTIVE_KWH[i-1]) > 50). '50' can be chagned.
---------------------------------------------------------------------
> for(i in 2:m){
 temp$ACTIVE_KWH[i]<- ifelse(temp$ACTIVE_KWH[i]- temp$ACTIVE_KWH[i-1]<0,NA, temp$ACTIVE_KWH[i])
}
----------------------------------------------------------------------
But, in this case,  critical error occured. If  temp$ACTIVE_KWH[3] is NA, posterior data (temp$ACTIVE_KWH[4], [5], [6]...)  is imputed as NA.
Last mail contains Detailed source code and result. 
Can you recommend better idea to avoid imputed dataset as a successive NA. 
-----Original Message-----
From: "arun"<smartpink111>@yahoo.com> 
To: "남윤주"<jamansymptom>@naver.com>; 
Cc: "R help"<r-help>@r-project.org>; 
Sent: 2013-01-29 (화) 23:28:30
Subject: Re: I succeed to get result dataset.

HI,

temp<-read.table(text="
 ID        CTIME   ACTIVE_KWH REACTIVE_KWH
1  HM001 201212121301 1201.9 1115.5
2  HM001 201212121302 1202.2 1115.8
3  HM001 201212121303 1202.8 1115.8
4  HM001 201212121304     NA 1116.1
5  HM001 201212121305 1203.9 1116.7
6  HM001 201212121306     NA 1116.7
7  HM001 201212121307     NA 1116.7
8  HM001 201212121308   12.0   31.0
9  HM001 201212121309 1206.0 1118.2
10 HM001 201212121310 1206.3 1118.6
11 HM001 201212121311 1206.5 1118.8
12 HM001 201212121312     NA     NA
13 HM001 201212121313 1207.3     NA
14 HM001 201212121314 1207.9 1121.1
15 HM001 201212121315 1208.4 1121.3
",sep="",header=TRUE,stringsAsFactors=F)

#Here, I assume that you consider <1000 as low values, You can change it accordingly.
 temp[,3:4][temp[,3]<1000& !is.na(temp[,3]),]<-NA
 temp
#      ID        CTIME ACTIVE_KWH REACTIVE_KWH
#1  HM001 201212121301     1201.9       1115.5
#2  HM001 201212121302     1202.2       1115.8
#3  HM001 201212121303     1202.8       1115.8
#4  HM001 201212121304         NA       1116.1
#5  HM001 201212121305     1203.9       1116.7
#6  HM001 201212121306         NA       1116.7
#7  HM001 201212121307         NA       1116.7
#8  HM001 201212121308         NA           NA
#9  HM001 201212121309     1206.0       1118.2
#10 HM001 201212121310     1206.3       1118.6
#11 HM001 201212121311     1206.5       1118.8
#12 HM001 201212121312         NA           NA
#13 HM001 201212121313     1207.3           NA
#14 HM001 201212121314     1207.9       1121.1
#15 HM001 201212121315     1208.4       1121.3


#Suppose your dataset is like this:
temp1<-read.table(text="
 ID        CTIME   ACTIVE_KWH REACTIVE_KWH
1  HM001 201212121301 1201.9 1115.5
2  HM001 201212121302 1202.2 1115.8
3  HM001 201212121303 1202.8 1115.8
4  HM001 201212121304     NA 1116.1
5  HM001 201212121305 1203.9 1116.7
6  HM001 201212121306     NA 1116.7
7  HM001 201212121307     NA 1116.7
8  HM001 201212121308   12.0   31.0
9  HM001 201212121309 1206.0 1118.2
10 HM001 201212121310 21.0 1118.6
11 HM001 201212121311 1206.5 1118.8
12 HM001 201212121312     NA     NA
13 HM001 201212121313 1207.3     NA
14 HM001 201212121314 1207.9 1121.1
15 HM001 201212121315 1208.4 22.0
",sep="",header=TRUE,stringsAsFactors=F)
temp1[,3][temp1[,3]<1000&!is.na(temp[,3])]<-NA
 temp1[,4][temp1[,4]<1000&!is.na(temp[,4])]<-NA

Hope it helps.

A.K.






________________________________
From: 남윤주 <jamansymptom>@naver.com>
To: arun <smartpink111>@yahoo.com> 
Sent: Tuesday, January 29, 2013 3:36 AM
Subject: Re: I succeed to get result dataset.


Arun ~ I have a dfficuliting in using R again. 
A Dataset 'temp' contatins NA and strange value(like 8 row 12.0, 31.0 which is out of range of value).

**What I want is to set strange value as NA.**  
Then I'll impute dataset 'temp' by myself.
Since, It is impossible to be little for 'WIDTH' and 'HEIGHT', 
I define a procdeure like below. 
> for(i in 2:m){
 ex$WIDTH[i]<- ifelse(ex$WIDTH [i]- ex$WIDTH [i-1]<0,NA, ex$WIDTH [i])
 ex$HEIGHT[i]<- ifelse(ex$HEIGHT[i]- ex$HEIGHT [i-1]<0,NA, ex$HEIGHT [i])
}

But result is wrong. Do u have better idea to define procedure performing well?

`There is a dataset named 'temp'.

      ID        CTIME   ACTIVE_KWH REACTIVE_KWH

1  HM001 201212121301 1201.9 1115.5

2  HM001 201212121302 1202.2 1115.8

3  HM001 201212121303 1202.8 1115.8

4  HM001 201212121304     NA 1116.1

5  HM001 201212121305 1203.9 1116.7

6  HM001 201212121306     NA 1116.7

7  HM001 201212121307     NA 1116.7

8  HM001 201212121308   12.0   31.0

9  HM001 201212121309 1206.0 1118.2

10 HM001 201212121310 1206.3 1118.6

11 HM001 201212121311 1206.5 1118.8

12 HM001 201212121312     NA     NA

13 HM001 201212121313 1207.3     NA

14 HM001 201212121314 1207.9 1121.1

15 HM001 201212121315 1208.4 1121.3

> m<- 15
> for(i in 2:m){temp$ACTIVE_KWH[i]<- ifelse(temp$ ACTIVE_KWH [i]- temp$ACTIVE_KWH[i-1]<0,NA, temp$ ACTIVE_KWH [i])
temp$REACTIVE_KWH[i]<- ifelse(temp$ REACTIVE_KWH [i]- temp$REACTIVE_KWH[i-1]<0,NA, temp$ REACTIVE_KWH [i])
}

**result of for statement** 

   ID        CTIME ACTIVE_KWH REACTIVE_KWH

1  HM001 201212121301     1201.9       1115.5

2  HM001 201212121302     1202.2       1115.8

3  HM001 201212121303     1202.8       1115.8

4  HM001 201212121304         NA       1116.1

5  HM001 201212121305         NA       1116.7

6  HM001 201212121306         NA       1116.7

7  HM001 201212121307         NA       1116.7

8  HM001 201212121308         NA           NA

9  HM001 201212121309         NA           NA

10 HM001 201212121310         NA           NA

11 HM001 201212121311         NA           NA

12 HM001 201212121312         NA           NA

13 HM001 201212121313         NA           NA

14 HM001 201212121314         NA           NA

15 HM001 201212121315         NA           NA

**What I expect (row8 WIDTH=NA, HEIGHT=NA)**  
ID        CTIME  WIDTH HEIGHT

1  HM001 201212121301 1201.9 1115.5

2  HM001 201212121302 1202.2 1115.8

3  HM001 201212121303 1202.8 1115.8

4  HM001 201212121304     NA 1116.1

5  HM001 201212121305 1203.9 1116.7

6  HM001 201212121306     NA 1116.7

7  HM001 201212121307     NA 1116.7

8  HM001 201212121308     NA     NA

9  HM001 201212121309 1206.0 1118.2

10 HM001 201212121310 1206.3 1118.6

11 HM001 201212121311 1206.5 1118.8

12 HM001 201212121312     NA     NA

13 HM001 201212121313 1207.3     NA

14 HM001 201212121314 1207.9 1121.1

15 HM001 201212121315 1208.4 1121.3
-----Original Message-----
From: "arun"<smartpink111>@yahoo.com> 
To: "남윤주"<jamansymptom>@naver.com>; 
Cc: 
Sent: 2013-01-29 (화) 15:23:56
Subject: Re: I succeed to get result dataset.

HI,

I am glad that it got fixed.

You can ask for help.
Thank you for the kind words.
Good night!
Arun                                                         



More information about the R-help mailing list