[R] Comparison of Date format

arun smartpink111 at yahoo.com
Sat Apr 13 06:41:21 CEST 2013



Hi,
 In the example you provided, it looks like the dates in Date2 happens first.  So, I changed it a bit.  

DataA<- read.table(text="
ID,Status,Date1,Date2                
1,A,3-Feb-01,15-May-01         
1,B,15-May-01,16-May-01         
1,A,16-May-01,3-Sep-01                     
1,B,3-Sep-01,13-Sep-01                     
1,C,13-Sep-01,26-Feb-04                     
2,A,9-Feb-01,24-May-01         
2,B,24-May-01,25-May-01                     
2,A,25-May-01,16-Mar-02                     
2,A,6-Mar-02,18-Mar-02
2,A,14-Sep-01,6-Mar-02         
",sep=",",header=TRUE,stringsAsFactors=FALSE)
library(stringr)
DataA[,3]<- str_trim(DataA[,3])
DataA[,4]<- str_trim(DataA[,4])
DataB<- read.table(text="
ID     Date.Accident         
1       3-Sep-01  
1     20-Jan-05 
1       26-Feb-04        
2     6-Mar-02
",sep="",header=TRUE,stringsAsFactors=FALSE)

 
lst1<-lapply(seq_len(nrow(DataB)),function(i) {x1<-unlist(mapply(function(x,y) which(x==y),DataA[,3:4],DataB[i,2]));x2<-if(length(x1)==2) DataA[x1[which.min(x1)],!names(DataA)%in%names(x1[which.max(x1)])] else if(length(x1)==1) DataA[x1,c("ID","Status",names(x1))] else NULL})

 lst2<-lapply(lst1,data.frame)
lst2<-lst2[lapply(lst2,nrow)!=0]
 lst2
#[[1]]
#  ID Status    Date2
#3  1      A 3-Sep-01

#[[2]]
#  ID Status     Date2
#5  1      C 26-Feb-04

#[[3]]
#  ID Status    Date1
#9  2      A 6-Mar-02
library(plyr)
 dataNew<-do.call(rbind,lapply(lst2,function(x) {colnames(x)[3]<- colnames(DataB)[2];x}))
res<-join(dataNew,DataB,by=c("Date.Accident","ID"),type="right")
 res
#  Date.Accident ID Status
#1      3-Sep-01  1      A
#2     20-Jan-05  1   <NA>
#3     26-Feb-04  1      C
#4      6-Mar-02  2      A



#or you can split by ID
lst1New<-lapply(unique(DataA$ID),function(i){x1<- DataA[DataA$ID==i,]; x2<- DataB[DataB$ID==i,]; do.call(rbind,lapply(seq_len(nrow(x2)),function(i) {x3<- unlist(mapply(function(x,y) which(x==y), x1[,3:4],x2[i,2])); x4<- if(length(x3)==2) x1[x3[which.min(x3)],!names(x1)%in%names(x3[which.max(x3)])] else if(length(x3)==1) x1[x3,c("ID","Status",names(x3))] else NULL})) })


 lst1New
#[[1]]
 # ID Status     Date2
#3  1      A  3-Sep-01
#5  1      C 26-Feb-04

#[[2]]
 # ID Status    Date1
#9  2      A 6-Mar-02
 dataNew1<- do.call(rbind,lapply(lst1New,function(x) {colnames(x)[3]<- colnames(DataB)[2];x}))
 res1<- join(dataNew1,DataB,by=c("Date.Accident","ID"),type="right")
 res1
#  Date.Accident ID Status
#1      3-Sep-01  1      A
#2     20-Jan-05  1   <NA>
#3     26-Feb-04  1      C
#4      6-Mar-02  2      A
A.K.


________________________________
 From: farnoosh sheikhi <farnoosh_81 at yahoo.com>
To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com> 
Sent: Friday, April 12, 2013 5:40 PM
Subject: Comparison of Date format 
 




 Hi there,

Hope all is well.
I have a complicated data and I need to create a new variable based on the date and description of the data.
I really appreciate if you can help me.
Here is how data look like:
DataA 
 DataB 
 
 
       ID Status Date1 Date2 
        ID Date.Accident 
 Result 
1   A 3-Feb-01 15-May-01 
 1 3-Sep-01 
 A 
1    B 15-May-01 16-May-01 
 1 20-Jan-05 
 NA 
1    A 16-May-01 3-Sep-01 
 
 
 
 
 
1    B 3-Sep-01 13-Sep-01 
 
 
 
 
 
1     C 13-Sep-01 26-Feb-04 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2    A 9-Feb-01 24-May-01 
 2 6-Mar-02 
 A 
2    B 24-May-01 25-May-01 
 
 
 
 
 
2    A 25-May-01 6-Mar-02 
 
 
 
 
 
2     A 6-Mar-02 18-Mar-02 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
I want to compare dataA to B for each ID. if Date 1 or Date 2 matches to Date.Accident return the result as status in dataA as a new result in Data B.
The trick here is I have two dates that are matched, but I want the status of the one that happen first. The sample size of each data is not the same.

I really appreciate your time and help.
Thanks.



More information about the R-help mailing list