[R] Data transformation problem

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Thu Nov 12 08:20:32 CET 2020


I am not a data.table afficiando, but here is how I would do it with 
dplyr/tidyr:

library(dplyr)
library(tidyr)

do_per_REL <- function( DF ) {
   rng <- range( DF$REF1 ) # watch out for missing months?
   DF <- (   data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by = "month" ) )
         %>% left_join( DF, by = "REF1" )
         %>% arrange( REF1 )
         )
   with( DF
       , data.frame( REF2 = REF1[ -1 ]
                   , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ]
                   )
       )
}

df2a <- (   df1
         %>% mutate( REF1 = as.Date( REF1 )
                   , REL1 = as.Date( REL1 )
                   )
         %>% nest( data = -REL1 )
         %>% rename( REL2 = REL1 )
         %>% rowwise()
         %>% mutate( data = list( do_per_REL( data ) ) )
         %>% ungroup()
         %>% unnest( cols = "data" )
         %>% select( REF2, REL2, VAL2 )
         %>% arrange( REF2, desc( REL2 ), VAL2 )
         )
df2a

On Wed, 11 Nov 2020, phil using philipsmith.ca wrote:

> I am stuck on a data transformation problem. I have a data frame, df1 in my 
> example, with some original "levels" data. The data pertain to some variable, 
> such as GDP, in various reference periods, REF, as estimated and released in 
> various release periods, REL. The release periods follow after the reference 
> periods by two months or more, sometimes by several years. I want to build a 
> second data frame, called df2 in my example, with the month-to-month growth 
> rates that existed in each reference period, revealing the revisions to those 
> growth rates in subsequent periods.
>
> REF1 <- c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
>  "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
>  "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
> REL1 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
> VAL1 <- 
> c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
>  19900,18765,13467)
> df1 <- data.frame(REF1,REL1,VAL1)
> REF2 <- c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
>  "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
> REL2 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
> VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
> df2 <- data.frame(REF2,REL2,VAL2)
>
> In my example I have provided some sample data pertaining to three reference 
> months, 2017-01-01 through 2017-03-01, and five release periods, 
> "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and "2019-05-01". In my 
> actual problem I have millions of REF-REL combinations, so my data frame is 
> quite large. I am using data.table for faster processing, though I am more 
> familiar with the tidyverse. I am providing df2 as the target data frame for 
> my example, so you can see what I am trying to achieve.
>
> I have not been able to find an efficient way to do these calculations. I 
> have tried "for" loops with "if" statements, without success so far, and 
> anyway this approach would be too slow, I fear. Suggestions as to how I might 
> proceed would be much appreciated.
>
> Philip
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list