[R] Data Table Merge Help
jdnewmil at dcn.davis.ca.us
Thu Feb 1 23:24:11 CET 2018
I rarely use data.table, but I think the vignette for the package discusses rolling joins. Also, Google popped up .
Sent from my phone. Please excuse my brevity.
On February 1, 2018 9:45:53 AM PST, "Graeve, Nick" <Graeve.Nick at principal.com> wrote:
>I'm not sure if this is an appropriate use of this mailing list or not,
>please let me know if it isn't. I'm struggling to figure out how to
>merge two data tables based on max effective date logic compared to
>when a payment occurred. My dtDistributions DT is a transactional
>dataset while dtDepartments is a domain data set containing all
>department names and the effective date of when department name changes
>have occurred. For the Bob example below, there was a payment on
>2016-01-01 which occurred in H229000. In 2012, this department was
>named "Modified Name", in 2019 the department will be named "Final
>Name". When I merge these two tables, I'd like it to pull the
>transactional data and match it up to department name "Modified Name"
>since that was the active department name at the time of that
>transaction. I've read documentation on foverlaps, but I'm not sure if
>this problem is considered a range of dates or not. At the bottom of
>this post is a temporarily solution that is working but it runs for a
>long time due to the amount of data in my actual source.
>Here is some sample data to get started:
>dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
> Department = factor(c("H229000", "H135000",
> Amount = c(5, 34, 87),
> PaymentDT = as.Date(c("2016-01-01",
>dtDepartments <- data.table(Department = factor(c("H229000", "H229000",
>"H229000", "H135000", "H047800")),
> EffDT = as.Date(c("2019-01-01", "2012-01-01",
>"1901-01-01", "1901-01-01", "1901-01-01")),
> Descr = c("Final Name","Modified
>Here is the output I would like to see:
>PayeeName Department PaymentDT Amount
>Bob Modified Name 2016-01-01 5
>Tracy Payables 2015-01-01 34
>Tom Postal 2015-01-01 87
>I was able to get this working by using the sqldf library, but it runs
>for a very long time in my actual dataset and I'd like to use
>data.table if at all possible.
>joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount
> FROM dtDistributions A, dtDepartments B
> WHERE A.DEPARTMENT = B.Department
> AND B.EffDT = (SELECT MAX(ED.EffDT)
> FROM dtDepartments ED
> WHERE B.Department = ED.Department
> AND ED.EffDT <= A.PaymentDT)"
>finalDT <- data.table(sqldf(joinString))
>This e-mail message is intended only for the use of the individual or
>entity to which it is addressed, and may contain information that is
>privileged, confidential and exempt from disclosure under applicable
>law. If you are not the intended recipient, any dissemination,
>distribution or copying of this communication is strictly prohibited.
>If you have received this communication in error, please notify us
>immediately by reply email to Connect at principal.com and delete or
>destroy all copies of the original message and attachments thereto.
>Email sent to or from the Principal Financial Group or any of its
>member companies may be retained as required by law or regulation.
>Nothing in this message is intended to constitute an Electronic
>signature for purposes of the Uniform Electronic Transactions Act
>(UETA) or the Electronic Signatures in Global and National Commerce Act
>("E-Sign") unless a specific statement to the contrary is included in
>If you no longer wish to receive any further solicitation from the
>Principal Financial Group you may unsubscribe at
>https://www.principal.com/do-not-contact-form any time.
>If you are a Canadian resident and no longer wish to receive commercial
>electronic messages you may unsubscribe at
>This message was secured by Zix(R).
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help