[R] Comparing dates in dataframes

David Winsemius dwinsemius at comcast.net
Sun Jan 17 19:06:39 CET 2010


On Jan 17, 2010, at 12:37 PM, James Rome wrote:

> I don't think it is that simple because it is not a one-to-one  
> match. In
> the arr data frame, there are many arrivals in a quarter hour with  
> good
> weather on a given day. So I need to match the date and the quarter  
> hour.
>
> And all of the rows in the weather data frame are times with good
> weather--unique date + quarter hour. That is why I needed the loop.  
> For
> each date and quarter hour in weather, I want to mark all the entries
> with the corresponding date and weather as TRUE in the arr$gw column.
>
> I did convert the dates to POSIXlt dates and rewrote my function as
> gooddates = function(all, good) {
>   la = length(all)   # All the arrivals
>  lw = length(good)  # The good 15-minute periods
>  for(j in 1:lw) {
>    d=good$Date[j]
>    q=good$quarter[j]
>    all$gw[all$Date==d && all$quarter==q]=TRUE


You are attempting a vectorized test and assignment with "&&" which  
seems unlikely to succeed, but even then I am not sure your problems  
would be over. (I'm also guessing that you might not have reported a  
warning.)

Why not merge arr to gw by date and quarter?

Answering these questions would be greatly speeded up with a small  
sample dataset. Are you aware of the virtues of the dput function?

-- 
David

>  }
> }
>
> Now it runs with no errors, but none of the 0s (FALSE) in arr$gw get
> replaced with 1s. So I am still doing something wrong.
>
> Thanks,
> Jim
>
> On 1/16/10 6:11 PM, jim holtman wrote:
>> If you have a vector of the quarter hours of good weather (gw), then
>> to create the column in the arr dataframe you would do
>>
>> arr$GoodWeather <- arr$quarter %in% gw
>>
>> This says that if the quarter hour of the arrival is in the 'gw'
>> vector, set the value TRUE; otherwise FALSE.
>>
>>
>>    On 1/16/10 4:26 PM, Stephan Kolassa wrote:
>>> Hi,
>>>
>>> it looks like when you read in your data.frames, you didn't tell
>>    R to
>>> expect dates, so it treats the Date columns as factors.
>>    Judicious use
>>> of  something along these lines before doing your comparisons
>>    may help:
>>>
>>> arr$Date <- as.Date(as.character(arr$Date),format=something)
>>>
>>> Then again, it may be possible to do the actual merging using
>>    merge().
>>>
>>> HTH
>>> Stephan
>>>
>>>
>>> James Rome schrieb:
>>>> I have two data frames. One (arr) has all arrivals to an
>>    airport for a
>>>> year, and the other (gw) has the dates and quarter hour of the
>>    day when
>>>> the weather is good. arr has a Date and quarter hour column.
>>>>
>>>>> names(arr)
>>>> [1] "Date"     "weekday"      "hour"         "month"
>>>> "minute"      [6] "quarter"      "ICAO"         "Flight"
>>>> "AircraftType"
>>>> "Tail"       [11] "Arrived"      "STA"          "Runway"
>>>> "FromTo"      "Delay"      [16] "Operator"     "gw"
>>>> I added the gw column to arr and initialized it to all FALSE
>>>>
>>>>> names(gw)
>>>> [1] "Date"           "minute"         "hour"
>>>> "quarter"       [5] "Efficiency.Val" "Weekly.Avg"
>>>> "Arrival.Val"    "Weekly.Avg.1"  [9] "Departure.Val"
>>>> "Weekly.Avg.2"   "Num.of.Hold"    "Runway"       [13] "Weather"
>>>> First point of confusion:
>>>>> gw[1,1]
>>>> [1] 1/1/09
>>>> 353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10
>>    1/12/09 ...
>>>> 9/9/09
>>>> Why do I get 353 levels?
>>>>
>>>> I am trying to identify the quarter hours with good weather in
>>    the arr
>>>> data frame. What I want to do is to go through the rows in gw,
>>    and to
>>>> set arr$gw to TRUE if arr$Date and arr$quarter match those in
>>    the gw
>>>> row.
>>>>
>>>> So I tried
>>>> gooddates = function(all, good) {
>>>>   la = length(all)   # All the flights
>>>>  lw = length(good)  # The good 15-minute periods
>>>>  for(j in 1:lw) {
>>>>    d=good$Date[j]
>>>>    q=good$quarter[j]
>>>>    all[all$DateTime==d && all$quarter==q,17]=TRUE
>>>>  }
>>>> }
>>>>
>>>> but when I run this, I get
>>>> "Error in Ops.factor(all$DateTime, d) :
>>>>  level sets of factors are different"
>>>>
>>>> I know the level sets are different, that is what I am trying
>>    to find.
>>>> But I think I am comparing single elements from the data frames.
>>>>
>>>> So what am I doing wrong? And there ought to be a better way to
>>    do this.
>>>>
>>>> Thanks in advance,
>>>> Jim Rome
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>    <http://www.r-project.org/posting-guide.html>
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>>    ______________________________________________
>>    R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>    https://stat.ethz.ch/mailman/listinfo/r-help
>>    PLEASE do read the posting guide
>>    http://www.R-project.org/posting-guide.html
>>    <http://www.r-project.org/posting-guide.html>
>>    and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>>
>> -- 
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list