[R] Fuzzy merge using timestamps

Sarah Goslee sarah.goslee at gmail.com
Wed Nov 10 20:09:16 CET 2010


Hi Ian,

Did you see the second part of my request, for a simple
reproducible example? At the very least, we need your
merge and sort code, since it sounds like that's where
the problem truly lies.

Sarah

On Wed, Nov 10, 2010 at 2:02 PM, Ian Craig <ian.jhsph at gmail.com> wrote:
> So here is a sample of the datasets.  I have modified them for the purposes
> of circulating to solve this problem.  I have already added the ID and key
> as seen in the code below.  Any ideas?
> gpsdata
>    gpsARC Protocol Track    UTCDate  UTCTime  LocalDate LocalTime        y
>       x Altitude_m Speed_km_h TimeBef_sec Dist_Bef_m TimeAft_sec DistAft_m
>     t_datetimegps ID1 key1
> 1  118259    38-C     1 07/21/2009 21:29:05 07/21/2009  17:29:05 39.29393
> -76.54961   0.547843   0.162998           0       0.00          36     27.25
> 2009-07-21 17:09:00   1    0
> 2  118259    38-C     1 07/21/2009 21:29:41 07/21/2009  17:29:41 39.29397
> -76.54930   6.615943  27.425990          36      27.25           4     32.53
> 2009-07-21 17:19:00   2    0
> 3  118259    38-C     1 07/21/2009 21:29:45 07/21/2009  17:29:45 39.29383
> -76.54897   6.357120  30.192910           4      32.53           3     26.39
> 2009-07-21 17:29:00   3    0
> 4  118259    38-C     1 07/21/2009 21:29:48 07/21/2009  17:29:48 39.29363
> -76.54881   6.134487  30.489670           3      26.39           5     27.27
> 2009-07-21 17:39:00   4    0
> 5  118259    38-C     1 07/21/2009 21:29:53 07/21/2009  17:29:53 39.29341
> -76.54868   6.451584  13.394810           5      27.27          10     27.47
> 2009-07-21 17:49:00   5    0
> 6  118259    38-C     1 07/21/2009 21:30:03 07/21/2009  17:30:03 39.29352
> -76.54839   4.324665  13.323760          10      27.47           5     26.25
> 2009-07-22 18:00:00   6    0
> 7  118259    38-C     1 07/21/2009 21:30:08 07/21/2009  17:30:08 39.29370
> -76.54819   3.662347  12.518040           5      26.25           8     30.68
> 2009-07-22 18:20:00   7    0
> 8  118259    38-C     1 07/21/2009 21:30:16 07/21/2009  17:30:16 39.29394
> -76.54837   2.378734  36.134190           8      30.68           3     31.95
> 2009-07-22 18:30:00   8    0
> 9  118259    38-C     1 07/21/2009 21:30:19 07/21/2009  17:30:19 39.29408
> -76.54870   2.222491  38.962510           3      31.95           3     31.20
> 2009-07-22 18:40:00   9    0
> 10 118259    38-C     1 07/21/2009 21:30:22 07/21/2009  17:30:22 39.29419
> -76.54903   2.024886  31.771920           3      31.20           6     25.91
> 2009-07-22 18:50:00  10    0
> 11 118259    38-C     1 07/21/2009 21:30:28 07/21/2009  17:30:28 39.29430
> -76.54929   2.831168   4.424288           6      25.91          51     25.10
> 2009-07-22 19:00:00  11    0
> 12 118259    38-C     1 07/21/2009 21:31:19 07/21/2009  17:31:19 39.29432
> -76.54958   1.545548  26.467430          51      25.10           3     26.97
> 2009-07-22 19:01:00  12    0
> 13 118259    38-C     1 07/21/2009 21:31:22 07/21/2009  17:31:22 39.29415
> -76.54980   2.062366  37.769800           3      26.97           3     34.60
> 2009-07-22 19:05:00  13    0
> 14 217276    38-C     1 07/21/2009 21:31:25 07/21/2009  17:31:25 39.29388
> -76.55000   2.120106  45.099640           3      34.60           2     25.38
> 2009-07-21 17:11:00  14    0
> 15 217276    38-C     1 07/21/2009 21:31:27 07/21/2009  17:31:27 39.29367
> -76.55014   2.107622  47.953690           2      25.38           2     26.39
> 2009-07-21 17:21:00  15    0
> 16 217276    38-C     1 07/21/2009 21:31:29 07/21/2009  17:31:29 39.29347
> -76.55029   1.942083  47.656870           2      26.39           2     26.72
> 2009-07-21 17:31:00  16    0
> 17 217276    38-C     1 07/21/2009 21:31:31 07/21/2009  17:31:31 39.29326
> -76.55045   2.023492  46.045600           2      26.72           3     36.03
> 2009-07-21 17:41:00  17    0
> 18 217276    38-C     1 07/21/2009 21:31:34 07/21/2009  17:31:34 39.29298
> -76.55066   2.211649  39.604580           3      36.03           3     27.07
> 2009-07-21 17:51:00  18    0
> 19 217276    38-C     1 07/21/2009 21:31:37 07/21/2009  17:31:37 39.29277
> -76.55083   1.479468  25.079980           3      27.07          40     29.64
> 2009-07-22 19:31:00  19    0
> 20 217276    38-C     1 07/21/2009 21:32:17 07/21/2009  17:32:17 39.29256
> -76.55103   2.041623  23.242470          40      29.64           4     26.38
> 2009-07-22 19:32:00  20    0
> 21 217276    38-C     1 07/21/2009 21:32:21 07/21/2009  17:32:21 39.29255
> -76.55134   2.122359  32.137710           4      26.38           3     34.66
> 2009-07-22 19:34:00  21    0
> 22 217276    38-C     1 07/21/2009 21:32:24 07/21/2009  17:32:24 39.29276
> -76.55164   0.776128  48.219300           3      34.66           2     28.80
> 2009-07-22 19:36:00  22    0
> 23 217276    38-C     1 07/21/2009 21:32:26 07/21/2009  17:32:26 39.29294
> -76.55187  -0.542242  53.586880           2      28.80           2     31.85
> 2009-07-22 19:38:00  23    0
> 24 217276    38-C     1 07/21/2009 21:32:28 07/21/2009  17:32:28 39.29316
> -76.55210  -0.383976  56.380370           2      31.85           2     32.48
> 2009-07-22 19:40:00  24    0
> 25 217276    38-C     1 07/21/2009 21:32:30 07/21/2009  17:32:30 39.29344
> -76.55223  -1.343364  58.055140           2      32.48           2     33.62
> 2009-07-22 19:41:00  25    0
> 26 217276    38-C     1 07/21/2009 21:32:32 07/21/2009  17:32:32 39.29374
> -76.55227  -2.229877  61.005420           2      33.62           2     35.23
> 2009-07-22 19:45:00  26    0
> 27 217276    38-C     1 07/21/2009 21:32:34 07/21/2009  17:32:34 39.29405
> -76.55220  -2.645037  65.670400           2      35.23           2     38.70
> 2009-07-23 11:02:00  27    0
> 28 217276    38-C     1 07/21/2009 21:32:36 07/21/2009  17:32:36 39.29438
> -76.55206  -3.099854  70.528790           2      38.70           2     40.63
> 2009-07-23 11:12:00  28    0
> 29 217276    38-C     1 07/21/2009 21:32:38 07/21/2009  17:32:38 39.29472
> -76.55186  -3.309293  73.246460           2      40.63           2     41.87
> 2009-07-23 11:22:00  29    0
> 30 217276    38-C     1 07/21/2009 21:32:40 07/21/2009  17:32:40 39.29505
> -76.55164  -2.752104  75.466320           2      41.87           2     42.81
> 2009-07-23 11:32:00  30    0
>
> urdata
>    Study  urARC   Begin_DateTime       t_datetime       Begin_Date COC_POS
> COD_POS TOT_POS        t_datetimeur ID2 key2
> 1  38-C 118259  11/3/2008 22:05  11/5/2008 10:05  11/3/2008 22:05       0
>     1       1 2008-11-05 10:05:00   1    1
> 2  38-C 118259 11/19/2008 23:33 11/21/2008 11:33 11/19/2008 23:33       0
>     1       1 2009-07-21 17:21:00   2    1
> 3  38-C 118259  11/23/2008 0:37 11/24/2008 12:37  11/23/2008 0:37       0
>     1       1 2008-11-24 12:37:00   3    1
> 4  38-C 118259  11/25/2008 5:18 11/26/2008 17:18  11/25/2008 5:18       0
>     1       1 31.95 2009-07-22 18:28:0   4    1
> 5  38-C 217276 11/26/2008 23:20 11/28/2008 11:20 11/26/2008 23:20       0
>     1       1 2008-11-28 11:20:00   5    1
> 6  38-C 217276  11/30/2008 0:35  12/1/2008 12:35  11/30/2008 0:35       0
>     1       1 2009-07-21 17:26:00   6    1
> 7  38-C 217276   12/2/2008 5:05  12/3/2008 17:05   12/2/2008 5:05       0
>     1       1 2008-12-03 17:05:00   7    1
> 8  38-C 217276   12/4/2008 5:47  12/5/2008 17:47   12/4/2008 5:47       0
>     1       1 2009-07-22 19:39:00   8    1
> 9  38-C 118259   12/7/2008 4:50  12/8/2008 16:50   12/7/2008 4:50       0
>     0       0 2008-12-08 16:50:00   9    1
> 10 38-C 118259   12/9/2008 4:29 12/10/2008 16:29   12/9/2008 4:29       0
>     0       0 2009-07-22 17:58:0  10    1
> 11 38-C 118259  12/11/2008 5:46 12/12/2008 17:46  12/11/2008 5:46       0
>     0       0 2008-12-12 17:46:00  11    1
> 12 38-C 217276  12/14/2008 0:21 12/15/2008 12:21  12/14/2008 0:21       0
>     0       0 2008-12-15 12:21:00  12    1
> 13 38-C 217276  12/16/2008 5:40 12/17/2008 17:40  12/16/2008 5:40       0
>     0       0 2008-12-17 17:40:00  13    1
> 14 38-C 217276  12/18/2008 5:47 12/19/2008 17:47  12/18/2008 5:47       0
>     0       0 2009-07-23 11:15:00  14    1
> 15 38-C 217276  12/21/2008 4:31 12/22/2008 16:31  12/21/2008 4:31       0
>     0       0 2008-12-22 16:31:00  15    1
>
> On Wed, Nov 10, 2010 at 1:12 PM, Sarah Goslee <sarah.goslee at gmail.com>
> wrote:
>>
>> On Wed, Nov 10, 2010 at 12:57 PM, Ian Craig <ian.jhsph at gmail.com> wrote:
>> > Greetings Supreme Council of R Masters,
>>
>> Nice. :)
>>
>> > I have two sets of data, each with a set of timestamps.  I would like to
>> > somehow merge the datasets based on the timestamps and an individual
>> > identifier.  That is there are several individuals all with timestamps,
>> > with
>> > times that could overlap.  By browsing through some of the older posts,
>> > I
>> > got the idea to create a third data frame of both sets of timestamps,
>> > individual identifiers, and a key to determine which dataset they have
>> > come
>> > from, then find the breaks to determine which of each dataset should be
>> > paired.  the code I have written so far look something like this.
>>
>> This would be easier to sort through if you included a toy example with
>> data so that we could try it. As it is, I have no idea what your data
>> actually look like.
>>
>> > gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps)
>> > urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur)
>> >
>> > gpsdata$ID1 <- row.names(gpsdata)
>> > urdata$ID2 <- row.names(urdata)
>> >
>> > gpsdata$key1 <- rep(0, nrow(gpsdata))
>> > urdata$key2 <- rep(1, nrow(urdata))
>> >
>> > checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2),
>> >        ARC=c(gpsdata$gpsARC, urdata$urARC),
>> >        times=c(gpsdata$t_datetimegps, urdata$t_datetimeur),
>> >        key=c(gpsdata$key1, urdata$key2))
>> >
>> > checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times,
>> > decreasing =
>> > FALSE),]
>> >
>> > breaks <- which(diff(checkTime$key) == 1)
>> >
>> > match <- data.frame(ID1=checkTime$ID[breaks],
>> >        gpsARC = checkTime$ARC[breaks],
>> >        urARC = checkTime$ARC[breaks + 1],
>> >        t_datetimegps=checkTime$times[breaks],
>> >        t_datetimeur=checkTime$times[breaks + 1])
>> >
>> > #Then I merge the 'match' data frame with the gpsdata data frame and the
>> > product with the urdata data frame.  The problem is that when I create
>> > the
>> > checkTime data frame and sort it, it sorts the urdata portion first then
>> > the
>> > gpsdata portion.   So my key column looks like
>> > 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
>> > 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key.
>> >  S.O.S!!!!  Why is it doing this?  Shouldn't it just order the
>> > timestamps of
>> > both data frames together?
>>
>> So really this is a sorting problem, not a merging problem? Is the merging
>> part working correctly?
>>
>> What exactly are you doing to merge? To sort?
>>
>> Here again a worked functional example would be really useful. Without
>> knowing what you're doing, I can't offer suggestions.
>>
>> Sarah
>>
>> --

>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list