[R] Fuzzy merge using timestamps

blurg ian.jhsph at gmail.com
Mon Nov 8 23:04:17 CET 2010


Greetings Supreme Council of R Masters,

Like toddler, I have gotten my head stuck in the banisters of R ... again. 
Let it be know I am still a neophyte in the R-community forum world, so
please don't flame me too bad.  

I have two sets of data, each with a set of timestamps.  I would like to
somehow merge the datasets based on the timestamps and an individual
identifier.  That is there are several individuals all with timestamps, with
times that could overlap.  By browsing through some of the older posts, I
got the idea to create a third data frame of both sets of timestamps,
individual identifiers, and a key to determine which dataset they have come
from, then find the breaks to determine which of each dataset should be
paired.  the code I have written so far look something like this.

gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps)
urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur)

gpsdata$ID1 <- row.names(gpsdata) 
urdata$ID2 <- row.names(urdata) 

gpsdata$key1 <- rep(0, nrow(gpsdata))
urdata$key2 <- rep(1, nrow(urdata))

checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2),
	ARC=c(gpsdata$gpsARC, urdata$urARC),
	times=c(gpsdata$t_datetimegps, urdata$t_datetimeur),
	key=c(gpsdata$key1, urdata$key2))

checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing =
FALSE),]

breaks <- which(diff(checkTime$key) == 1)

match <- data.frame(ID1=checkTime$ID[breaks], 
	gpsARC = checkTime$ARC[breaks],
	urARC = checkTime$ARC[breaks + 1], 
	t_datetimegps=checkTime$times[breaks], 
	t_datetimeur=checkTime$times[breaks + 1])

#Then I merge the 'match' data frame with the gpsdata data frame and the
product with the urdata data frame.  The problem is that when I create the
checkTime data frame and sort it, it sorts the urdata portion first then the
gpsdata portion.   So my key column looks like
1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key. 
S.O.S!!!!  Why is it doing this?  Shouldn't it just order the timestamps of
both data frames together?

Thanks for all your enlightenment.






-- 
View this message in context: http://r.789695.n4.nabble.com/Fuzzy-merge-using-timestamps-tp3032745p3032745.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list