[R] Merge with closest (not equal) time stamps

Naresh Gurbuxani n@re@h_gurbux@n| @end|ng |rom hotm@||@com
Tue Aug 8 13:12:32 CEST 2023


I was able to adapt your solution using packages with which I am more familiar. 

myres2 <- merge(option.trades, stock.trades, by = "timestamp", all =
TRUE)

myres2[,"stock.timestamp"] <- ifelse(is.na(myres2$stock.price), NA,
myres2$timestamp)

myres2$stock.timestamp <- as.POSIXct(myres2$stock.timestamp, origin =
"1970-01-01")

library(zoo)
myres2$stock.price <- na.locf(myres2$stock.price)
myres2$stock.timestamp <- na.locf(myres2$stock.timestamp)
myres2 <- myres2[!is.na(myres2$option.price),]
row.names(myres2) <- NULL

all.equal(myres, myres2[,c(1, 2, 4, 3)]) # TRUE

This calculation is indeed faster.  

Thanks for your help,
Naresh

> On Aug 8, 2023, at 5:39 AM, Eric Berger <ericjberger using gmail.com> wrote:
> 
> Hi Naresh,
> Perhaps the below is faster than your approach
> 
> library(dplyr)
> library(tidyr)
> merge(option.trades, stock.trades, by="timestamp", all=TRUE) |>
>  dplyr::arrange(timestamp) |>
>  dplyr::mutate(stock.timestamp =
> as.POSIXct(ifelse(is.na(option.price), timestamp, NA))) |>
>  tidyr::fill(stock.price, stock.timestamp) |>
>  dplyr::filter(!is.na(option.price)) |>
>  dplyr::select(1,2,4,3) ->
>  myres2
> 
> identical(myres, myres2)  ## TRUE
> 
> 
> On Tue, Aug 8, 2023 at 5:32 AM Naresh Gurbuxani
> <naresh_gurbuxani using hotmail.com> wrote:
>> 
>> 
>> I have two dataframes, each with a column for timestamp.  I want to
>> merge the two dataframes such that each row from first dataframe
>> is matched with the row in the second dataframe with most recent but
>> preceding timestamp. Here is an example.
>> 
>> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 1.8))
>> 
>> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))
>> 
>> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>> 
>> library(plyr)
>> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) tail(subset(stock.trades, timestamp <= tstamp), 1))
>> names(mystock.prices)[1] <- "stock.timestamp"
>> myres <- cbind(option.trades, mystock.prices)
>> 
>> This method works. But for large dataframes, it is very slow.  Is there
>> a way to speed up the merge?
>> 
>> Thanks,
>> Naresh
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list