[R] Thank you your help and one more question.

Tue Jan 29 04:20:10 CET 2013

HI,

I don't have Amelia package installed.

If you want to get the mean value, you could use either ?aggregate(),  or ?ddply() from library(plyr)

library(plyr)
imputNew<-do.call(rbind,imput1_2_3)
 res1<-ddply(imputNew,.(ID,CTIME),function(x) mean(x$WEIGHT))
 names(res1)[3]<-"WEIGHT"
 head(res1)
 #    ID CTIME   WEIGHT
#1 HM001  1223 24.90000
#2 HM001  1224 25.20000
#3 HM001  1225 25.50000
#4 HM001  1226 25.41933
#5 HM001  1227 25.70000
#6 HM001  1228 27.10000

#or
res2<-aggregate(.~ID+CTIME,data=imputNew,mean)
#or
res3<-  do.call(rbind,lapply(split(imputNew,imputNew$CTIME),function(x) {x$WEIGHT<-mean(x[,3]);head(x,1)}))
row.names(res3)<-1:nrow(res3)
identical(res1,res2)
#[1] TRUE
 identical(res1,res3)
#[1] TRUE
A.K.

________________________________
From: 남윤주 <jamansymptom at naver.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Monday, January 28, 2013 9:47 PM
Subject: Re: Thank you your help and one more question.

Thank you for replying my question.
What I want is the matrix like below.
I have 3 data sets that named weightimp1, 2, 3. 
And, to get the matrix like below, I have to combine 3 data sets(named weightimp1, 2, 3).
I don't know how to 3data sets combined. It could be mean of 3 data set. Or, there might be a value(temp2$imputations$...) in Amelia package.
I prefer to use Amelia package method, but if it dosen't exist, can u recommend how to set as a mean value? 

#      ID CTIME WEIGHT (It represents 3 data sets(weightimp1, 2, 3)
#1  HM001  1223   24.90000   
#2  HM001  1224   25.20000 
#3  HM001  1225   25.50000  
#4  HM001  1226   25.24132  
#5  HM001  1227   25.70000   
#6  HM001  1228   27.10000   
#7  HM001  1229   27.30000   
#8  HM001  1230   27.40000  
#9  HM001  1231   28.40000   
#10 HM001  1232   29.20000  
#11 HM001  1233   30.13770   
#12 HM001  1234   31.17251   
#13 HM001  1235   32.40000   
#14 HM001  1236   33.70000   
#15 HM001  1237   34.30000   
-----Original Message-----
From: "arun"<smartpink111 at yahoo.com> 
To: "남윤주"<jamansymptom at naver.com>; 
Cc: "R help"<r-help at r-project.org>; 
Sent: 2013-01-29 (화) 11:25:38
Subject: Re: Thank you your help and one more question.

HI,

How do you want to combine the results?
It looks like the 5 datasets are list elements.

If I take the first three list elements,
imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377, 
31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977, 
31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185, 
31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")))
#It could be combined by:
do.call(rbind, imput1_2_3)# But if you do this the total number or rows will be the sum of the number of rows of each dataset.

I guess you want something like this:

res<-Reduce(function(...) merge(...,by=c("ID","CTIME")),imput1_2_3)
 names(res)[3:5]<- paste("WEIGHT","IMP",1:3,sep="")
 res
#      ID CTIME WEIGHTIMP1 WEIGHTIMP2 WEIGHTIMP3
#1  HM001  1223   24.90000   24.90000   24.90000
#2  HM001  1224   25.20000   25.20000   25.20000
#3  HM001  1225   25.50000   25.50000   25.50000
#4  HM001  1226   25.24132   25.54828   25.46838
#5  HM001  1227   25.70000   25.70000   25.70000
#6  HM001  1228   27.10000   27.10000   27.10000
#7  HM001  1229   27.30000   27.30000   27.30000
#8  HM001  1230   27.40000   27.40000   27.40000
#9  HM001  1231   28.40000   28.40000   28.40000
#10 HM001  1232   29.20000   29.20000   29.20000
#11 HM001  1233   30.13770   29.89770   30.88185
#12 HM001  1234   31.17251   31.35045   31.57952
#13 HM001  1235   32.40000   32.40000   32.40000
#14 HM001  1236   33.70000   33.70000   33.70000
#15 HM001  1237   34.30000   34.30000   34.30000
A.K.

________________________________
From: 남윤주 <jamansymptom>@naver.com>
To: arun <smartpink111>@yahoo.com> 
Sent: Monday, January 28, 2013 7:35 PM
Subject: Thank you your help and one more question.

http://us-mg6.mail.yahoo.com/neo/launch?.rand=3qkohpi922i2q#
I deeply appreciate your help. Answering your question, I am software engineer. And I am developing system accumulating data to draw chart and table.
For higher perfromance, I have to deal missing value treatment.  So, I use Amelia Pacakge. Below is the result follwing your answer.
----------------------------------------------------------------
>temp2    #origin data
 ID CTIME WEIGHT
1  HM001  1223   24.9
2  HM001  1224   25.2
3  HM001  1225   25.5
4  HM001  1226     NA
5  HM001  1227   25.7
6  HM001  1228   27.1
7  HM001  1229   27.3
8  HM001  1230   27.4
9  HM001  1231   28.4
10 HM001  1232   29.2
11 HM001  1233 1221.0
12 HM001  1234     NA
13 HM001  1235   32.4
14 HM001  1236   33.7
15 HM001  1237   34.3 
> temp2$WEIGHT<- ifelse(temp2$WEIGHT>50,NA,temp2$WEIGHT)
 >temp2    # After eliminating strange value
      ID CTIME WEIGHT
1  HM001  1223   24.9
2  HM001  1224   25.2
3  HM001  1225   25.5
4  HM001  1226     NA
5  HM001  1227   25.7
6  HM001  1228   27.1
7  HM001  1229   27.3
8  HM001  1230   27.4
9  HM001  1231   28.4
10 HM001  1232   29.2
11 HM001  1233     NA
12 HM001  1234     NA
13 HM001  1235   32.4
14 HM001  1236   33.7
15 HM001  1237   34.3
-------------------------------------------------------------- 
I have One more question. Below are codes and results.
--------------------------------------------------------------
> a.out2<-amelia(temp2, m=5, ts="CTIME", cs="ID", polytime=1)
-- Imputation 1 --
 1  2  3  4 
-- Imputation 2 --
 1  2  3 
-- Imputation 3 --
 1  2  3  4 
-- Imputation 4 --
 1  2  3 
-- Imputation 5 --
 1  2  3 

> a.out2$imputations
$imp1
      ID CTIME   WEIGHT
1  HM001  1223 24.90000
2  HM001  1224 25.20000
3  HM001  1225 25.50000
4  HM001  1226 25.24132
5  HM001  1227 25.70000
6  HM001  1228 27.10000
7  HM001  1229 27.30000
8  HM001  1230 27.40000
9  HM001  1231 28.40000
10 HM001  1232 29.20000
11 HM001  1233 30.13770
12 HM001  1234 31.17251
13 HM001  1235 32.40000
14 HM001  1236 33.70000
15 HM001  1237 34.30000
$imp2
      ID CTIME   WEIGHT
1  HM001  1223 24.90000
2  HM001  1224 25.20000
3  HM001  1225 25.50000
4  HM001  1226 25.54828
5  HM001  1227 25.70000
6  HM001  1228 27.10000
7  HM001  1229 27.30000
8  HM001  1230 27.40000
9  HM001  1231 28.40000
10 HM001  1232 29.20000
11 HM001  1233 29.89770
12 HM001  1234 31.35045
13 HM001  1235 32.40000
14 HM001  1236 33.70000
15 HM001  1237 34.30000
$imp3
      ID CTIME   WEIGHT
1  HM001  1223 24.90000
2  HM001  1224 25.20000
3  HM001  1225 25.50000
4  HM001  1226 25.46838
5  HM001  1227 25.70000
6  HM001  1228 27.10000
7  HM001  1229 27.30000
8  HM001  1230 27.40000
9  HM001  1231 28.40000
10 HM001  1232 29.20000
11 HM001  1233 30.88185
12 HM001  1234 31.57952
13 HM001  1235 32.40000
14 HM001  1236 33.70000
15 HM001  1237 34.30000
$imp4
      ID CTIME   WEIGHT
1  HM001  1223 24.90000
2  HM001  1224 25.20000
3  HM001  1225 25.50000
4  HM001  1226 25.86703
5  HM001  1227 25.70000
6  HM001  1228 27.10000
7  HM001  1229 27.30000
8  HM001  1230 27.40000
9  HM001  1231 28.40000
10 HM001  1232 29.20000
11 HM001  1233 30.61241
12 HM001  1234 30.17042
13 HM001  1235 32.40000
14 HM001  1236 33.70000
15 HM001  1237 34.30000
$imp5
      ID CTIME   WEIGHT
1  HM001  1223 24.90000
2  HM001  1224 25.20000
3  HM001  1225 25.50000
4  HM001  1226 26.05747
5  HM001  1227 25.70000
6  HM001  1228 27.10000
7  HM001  1229 27.30000
8  HM001  1230 27.40000
9  HM001  1231 28.40000
10 HM001  1232 29.20000
11 HM001  1233 31.03894
12 HM001  1234 30.90960
13 HM001  1235 32.40000
14 HM001  1236 33.70000
15 HM001  1237 34.30000
----------------------------------------
I got 5 datasets including imputed values. But What I want is not five datasets, only one data set which combine those 5 imputed datasets.
I wannacombine $imp1, $imp2... $imp5 to get a final result set. This result set is also (3 X 15) matrix.
Would you help me once more please?

-----Original Message-----
From: "arun"<smartpink111>@yahoo.com> 
To: "남윤주"<jamansymptom>@naver.com>; 
Cc: "R help"<r-help>@r-project.org>; 
Sent: 2013-01-28 (월) 23:48:51
Subject: Re: Thank you your help.

Hi,
temp3<- read.table(text="
ID CTIME WEIGHT
HM001 1223 24.0
HM001 1224 25.2
HM001 1225 23.1
HM001 1226 NA
HM001 1227 32.1
HM001 1228 32.4
HM001 1229 1323.2
HM001 1230 27.4
HM001 1231 22.4236 #changed here to test the previous solution
",sep="",header=TRUE,stringsAsFactors=FALSE)
 tempnew<- na.omit(temp3)

 grep("\\d{4}",temp3$WEIGHT) 
#[1] 7 9 #not correct

temp3[,3][grep("\\d{4}..*",temp3$WEIGHT)]<-NA #match 4 digit numbers before the decimals
tail(temp3)
#     ID CTIME  WEIGHT
#4 HM001  1226      NA
#5 HM001  1227 32.1000
#6 HM001  1228 32.4000
#7 HM001  1229      NA
#8 HM001  1230 27.4000
#9 HM001  1231 22.4236

#Based on the variance,
You could set up some limit, for example 50 and use:
tempnew$WEIGHT<- ifelse(tempnew$WEIGHT>50,NA,tempnew$WEIGHT)
A.K.

________________________________
From: 남윤주 <jamansymptom>@naver.com>
To: arun <smartpink111>@yahoo.com> 
Sent: Monday, January 28, 2013 2:20 AM
Subject: Re: Thank you your help.

Thank you for your reply again.  Your understanding is exactly right.
I attached a picture that show dataset.
'weight' is a dependent variable. And CTIME means hour/minute. This data will have accumulated for years.
Speaking of accepted variance range, it would be from 10 to 50. 
Actually, I am java programmer. So, I am strange this R Language.
Can u give me some example to use grep function?
-----Original Message-----
From: "arun"<smartpink111>@yahoo.com> 
To: "jamansymptom at naver.com"<jamansymptom>@naver.com>; 
Cc: 
Sent: 2013-01-28 (월) 15:27:12
Subject: Re: Thank you your help.

Hi,
Your original post was that 
"...it was evaluated from 20kg -40kg. But By some errors, it is evaluated 2000 kg".

So, my understanding was that you get values 2000 or 2000-4000 reads in place of 20-40 occasionally due to some misreading.

If your dataset contains observed value, strange value and NA and you want to replace the strange value to NA, could you mention the range of strange values.  If the strange value ranges anywhere between 1000-9999, it should get replaced with the ?grep() solution.  But, if it depends upon something else, you need to specify.  Also, regarding the variance, what is your accepted range of variance.
A.K.

----- Original Message -----
From: "jamansymptom at naver.com" <jamansymptom>@naver.com>
To: smartpink111 at yahoo.com
Cc: 
Sent: Monday, January 28, 2013 1:15 AM
Subject: Thank you your help.

Thank you to answer my question. 
It is not exactly what I want. I should have informed detailed situation. 
There is a sensor get data every minute. And that data will be accumulated and be portion of dataset. 
And the dataset contains observed value, strange value and NA. 
Namely, I am not sure where strange value will be occured. 
And I can't expect when strange value will be occured. 

I need the procedure performing like below.  
1. using a method, set the range of variance 
2. using for(i) statement, check whether variance(weihgt) is in the range. 
3. when variance is out of range, impute weight[i] as NA. 

Thank you.