[R] Data transformation problem

phii m@iii@g oii phiiipsmith@c@ phii m@iii@g oii phiiipsmith@c@
Thu Nov 12 13:23:34 CET 2020


Thank you so much for this elegant solution, Jeff.

Philip

On 2020-11-12 02:20, Jeff Newmiller wrote:
> I am not a data.table afficiando, but here is how I would do it with
> dplyr/tidyr:
> 
> library(dplyr)
> library(tidyr)
> 
> do_per_REL <- function( DF ) {
>   rng <- range( DF$REF1 ) # watch out for missing months?
>   DF <- (   data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by = "month" ) 
> )
>         %>% left_join( DF, by = "REF1" )
>         %>% arrange( REF1 )
>         )
>   with( DF
>       , data.frame( REF2 = REF1[ -1 ]
>                   , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ]
>                   )
>       )
> }
> 
> df2a <- (   df1
>         %>% mutate( REF1 = as.Date( REF1 )
>                   , REL1 = as.Date( REL1 )
>                   )
>         %>% nest( data = -REL1 )
>         %>% rename( REL2 = REL1 )
>         %>% rowwise()
>         %>% mutate( data = list( do_per_REL( data ) ) )
>         %>% ungroup()
>         %>% unnest( cols = "data" )
>         %>% select( REF2, REL2, VAL2 )
>         %>% arrange( REF2, desc( REL2 ), VAL2 )
>         )
> df2a
> 
> On Wed, 11 Nov 2020, phil using philipsmith.ca wrote:
> 
>> I am stuck on a data transformation problem. I have a data frame, df1 
>> in my example, with some original "levels" data. The data pertain to 
>> some variable, such as GDP, in various reference periods, REF, as 
>> estimated and released in various release periods, REL. The release 
>> periods follow after the reference periods by two months or more, 
>> sometimes by several years. I want to build a second data frame, 
>> called df2 in my example, with the month-to-month growth rates that 
>> existed in each reference period, revealing the revisions to those 
>> growth rates in subsequent periods.
>> 
>> REF1 <- 
>> c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
>>  "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
>>  "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
>> REL1 <- 
>> c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>>  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>>  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
>> VAL1 <- 
>> c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
>>  19900,18765,13467)
>> df1 <- data.frame(REF1,REL1,VAL1)
>> REF2 <- 
>> c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
>>  "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
>> REL2 <- 
>> c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>>  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
>> VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
>> df2 <- data.frame(REF2,REL2,VAL2)
>> 
>> In my example I have provided some sample data pertaining to three 
>> reference months, 2017-01-01 through 2017-03-01, and five release 
>> periods, "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and 
>> "2019-05-01". In my actual problem I have millions of REF-REL 
>> combinations, so my data frame is quite large. I am using data.table 
>> for faster processing, though I am more familiar with the tidyverse. I 
>> am providing df2 as the target data frame for my example, so you can 
>> see what I am trying to achieve.
>> 
>> I have not been able to find an efficient way to do these 
>> calculations. I have tried "for" loops with "if" statements, without 
>> success so far, and anyway this approach would be too slow, I fear. 
>> Suggestions as to how I might proceed would be much appreciated.
>> 
>> Philip
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go 
> Live...
> DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live 
> Go...
>                                       Live:   OO#.. Dead: OO#..  
> Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  
> rocks...1k
> ---------------------------------------------------------------------------



More information about the R-help mailing list