[R] Computing growth rate

Thu Dec 15 13:34:39 CET 2016

Dear Mr Hasselman,

I missed you mail, while I was typing my own mail as a reply to Mr.
Barradas suggestion. In fact, I implemented your suggestion even
before reading it. But, I have a concern that I have noted (though its
only hypothetical- such a scenario is very unlikely to occur). Is
there a way to restrict such calculations co_code1 wise?

Many thanks,

Brijesh

On Thu, Dec 15, 2016 at 5:48 PM, Berend Hasselman <bhh at xs4all.nl> wrote:
>
>> On 15 Dec 2016, at 04:40, Brijesh Mishra <brijeshkmishra at gmail.com> wrote:
>>
>> Hi,
>>
>> I am trying to calculate growth rate (say, sales, though it is to be
>> computed for many variables) in a panel data set. Problem is that I
>> have missing data for many firms for many years. To put it simply, I
>> have created this short dataframe (original df id much bigger)
>>
>> df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7),
>> fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3))
>>
>> # this gives me
>> co_code1 fyear1 sales1
>> 1      1100   1990   1000
>> 2      1100   1991   1100
>> 3      1100   1992   1200
>> 4      1100   1993   1300
>> 5      1100   1994   1400
>> 6      1100   1995   1500
>> 7      1100   1996   1600
>> 8      1200   1990   1000
>> 9      1200   1991   1100
>> 10     1200   1992   1200
>> 11     1200   1993   1300
>> 12     1200   1994   1400
>> 13     1200   1995   1500
>> 14     1200   1996   1600
>> 15     1300   1990   1000
>> 16     1300   1991   1100
>> 17     1300   1992   1200
>> 18     1300   1993   1300
>> 19     1300   1994   1400
>> 20     1300   1995   1500
>> 21     1300   1996   1600
>>
>> # I am now removing a couple of rows
>> df1<-df1[-c(5, 8), ]
>> # the result is
>>   co_code1 fyear1 sales1
>> 1      1100   1990   1000
>> 2      1100   1991   1100
>> 3      1100   1992   1200
>> 4      1100   1993   1300
>> 6      1100   1995   1500
>> 7      1100   1996   1600
>> 9      1200   1991   1100
>> 10     1200   1992   1200
>> 11     1200   1993   1300
>> 12     1200   1994   1400
>> 13     1200   1995   1500
>> 14     1200   1996   1600
>> 15     1300   1990   1000
>> 16     1300   1991   1100
>> 17     1300   1992   1200
>> 18     1300   1993   1300
>> 19     1300   1994   1400
>> 20     1300   1995   1500
>> 21     1300   1996   1600
>> # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been
>> removed. If I try,
>> d<-ddply(df1,"co_code1",transform, growth=c(NA,exp(diff(log(sales1)))-1)*100)
>>
>> # this apparently gives wrong results for the year 1995 (as shown
>> below) as growth rates are computed considering yearly increment.
>>
>>   co_code1 fyear1 sales1    growth
>> 1      1100   1990   1000        NA
>> 2      1100   1991   1100 10.000000
>> 3      1100   1992   1200  9.090909
>> 4      1100   1993   1300  8.333333
>> 5      1100   1995   1500 15.384615
>> 6      1100   1996   1600  6.666667
>> 7      1200   1991   1100        NA
>> 8      1200   1992   1200  9.090909
>> 9      1200   1993   1300  8.333333
>> 10     1200   1994   1400  7.692308
>> 11     1200   1995   1500  7.142857
>> 12     1200   1996   1600  6.666667
>> 13     1300   1990   1000        NA
>> 14     1300   1991   1100 10.000000
>> 15     1300   1992   1200  9.090909
>> 16     1300   1993   1300  8.333333
>> 17     1300   1994   1400  7.692308
>> 18     1300   1995   1500  7.142857
>> 19     1300   1996   1600  6.666667
>> # I thought of using the formula only when the increment of fyear1 is
>> only 1 while in a co_code1, by using this formula
>>
>> d<-ddply(df1,
>>         "co_code1",
>>         transform,
>>         if(diff(fyear1)==1){
>>           growth=(exp(diff(log(df1$sales1)))-1)*100
>>         } else{
>>           growth=NA
>>         })
>>
>> But, this doesn't work. I am getting the following error.
>>
>> In if (diff(fyear1) == 1) { :
>>  the condition has length > 1 and only the first element will be used
>> (repeated a few times).
>>
>> # I have searched for a solution, but somehow couldn't get one. Hope
>> that some kind soul will guide me here.
>>
>
> In your case use ifelse() as explained by Rui.
> But it can be done more easily since the fyear1 and co_code1 are synchronized.
> Add a new column to df1 like this
>
> df1$growth <- c(NA,
>          ifelse(diff(df1$fyear1)==1,
>                     (exp(diff(log(df1$sales1)))-1)*100,
>                     NA
>                     )
>         )
>
> and display df1. From your request I cannot determine if this is what you want.
>
> regards,
>
> Berend Hasselman
>