[R] How to select a row from one dataframe that is "close" to a row in another dataframe

James Rome jamesrome at gmail.com
Sat Mar 20 18:39:36 CET 2010


On 3/20/2010 11:52 AM, Daniel Malter wrote:

If the flight identifiers runway$Flight and oooi$Flight are unique (i.e.
only one observation has the same identifier in each dataset), you could use
merge() to bind together the dataset based on matching the two. See,

?merge

Also, I see an OnDate variable in both dataset. So if Flight does not
provide unique identification, maybe Flight and OnDate together do, which
can also be handled in merge.

Let us know if that solves the problem.

Best,
Daniel 
-----------------------------------
Alas, the flight names are not unique (they fly each day). You would think that the OnDate would be the same, but flights arriving at midnight could appear on different days, which is why I am using seconds past 1/1/1970.

Will merge work with different length dataframes? Perhaps I could do it in multiple steps, assuming that the dates were the same, and then fixing the errors?

And I found out that abs() will not take difftime as an argument. I hope I can multiply a difftime by itself and check that way.

And to use sqldf, it looks as if I have to read the source data files directly into sqldf to use it. It has to make a database. In that case, wouldn't I be better doing the whole thing in a database?

Jim

> names(oooi)
>   
 [1] "FltOrigDt"               "MkdCrrCd"              
 [3] "MkdFltNbr"               "DprtTrpnStnCd"         
 [5] "ArrTrpnStnCd"            "ActualOutLocalTimestamp"
 [7] "ActualOffLocal"          "ActualOnLocal"         
 [9] "ActualInLocal"           "ArrivalGate"           
[11] "DepartureGate"           "Flight"                
[13] "OnDate"                  "MinutesIntoDay"        
[15] "OnHour"                  "pt"  


> names(runway)
>   
 [1] "OnDateTime"     "IATA"           "ICAO"           "Flight"       
 [5] "AircraftType"   "Tail"           "Arrived"        "STA"          
 [9] "Runway"         "From.To"        "Delay"          "OnDate"       
[13] "MinutesIntoDay" "pt"   

These sets have several hundred thousand rows.

In both sets, pt is a POSIXct for the arrival time (from different
sources). They are not identical, but surely should be within an hour of
each other (hopefully a lot less), and the Flight fields must be the
same. So
(abs(runway$pt - oooi$pt) < 3600) & (runway$Flight == oooi$Flight)
should pick out the corresponding rows in the two data sets (if there is
a match).

What I need to do is to take the Runway from runway and insert it into
the oooi df for the correct flight.

What is the best way to do this in R?



More information about the R-help mailing list