[R] Data transformation problem

phii m@iii@g oii phiiipsmith@c@ phii m@iii@g oii phiiipsmith@c@
Thu Nov 12 04:25:27 CET 2020


I am stuck on a data transformation problem. I have a data frame, df1 in 
my example, with some original "levels" data. The data pertain to some 
variable, such as GDP, in various reference periods, REF, as estimated 
and released in various release periods, REL. The release periods follow 
after the reference periods by two months or more, sometimes by several 
years. I want to build a second data frame, called df2 in my example, 
with the month-to-month growth rates that existed in each reference 
period, revealing the revisions to those growth rates in subsequent 
periods.

REF1 <- 
c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
   "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
   "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL1 <- 
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
   "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
   "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL1 <- 
c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
   19900,18765,13467)
df1 <- data.frame(REF1,REL1,VAL1)
REF2 <- 
c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
   "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL2 <- 
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
   "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
df2 <- data.frame(REF2,REL2,VAL2)

In my example I have provided some sample data pertaining to three 
reference months, 2017-01-01 through 2017-03-01, and five release 
periods, "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and 
"2019-05-01". In my actual problem I have millions of REF-REL 
combinations, so my data frame is quite large. I am using data.table for 
faster processing, though I am more familiar with the tidyverse. I am 
providing df2 as the target data frame for my example, so you can see 
what I am trying to achieve.

I have not been able to find an efficient way to do these calculations. 
I have tried "for" loops with "if" statements, without success so far, 
and anyway this approach would be too slow, I fear. Suggestions as to 
how I might proceed would be much appreciated.

Philip



More information about the R-help mailing list