[R] Panel data handling (lags, growth rates)

Ajay Narottam Shah ajayshah at mayin.org
Sun Aug 14 12:33:36 CEST 2005


I have written two functions which do useful things with panel data
a.k.a. longitudinal data, where one unit of observation (a firm or a
person or an animal) is observed on a uniform time grid:
   - The first function makes lagged values of variables of your choice.
   - The second function makes growth rates w.r.t. q observations ago,
      for variables of your choice.

These strike me as bread-and-butter tasks in dealing with panel
data. I couldn't find these functions in the standard R
libraries. They are presented in this email for two reasons. First,
it'll be great if R gurus can look at the code and propose
improvements. Second, it'll be great if some package-owner can adopt
these orphans :-) and make them available to the R community.

The two functions follow:

library(Hmisc)                          # Am using Lag() in this.

# Task: For a supplied list of variables (the list `lagvars'),
#       make new columns in a dataset denoting lagged values.
#       You must supply `unitvar' which identifies the unit that's
#           repeatedly observed.
#       You must supply the name of the time variable `timevar'
#       and you must tell a list of the lags that interest you (`lags')
# Example:
#  paneldata.lags(A, "person", "year", c("v1","v2"), lags=1:4)
paneldata.lags <- function(X, unitvar, timevar, lagvars, lags=1) {
  stopifnot(length(lagvars)>=1)
  X <- X[order(X[,timevar]),]           # just in case it's not sorted.

  innertask <- function(Y, lagvars, lags) {
    E <- labels <- NULL
    for (v in lagvars) {
      for (i in lags) {
        E <- cbind(E, Lag(Y[,v], i))
      }
      labels <- c(labels, paste(v, ".l", lags, sep=""))
    }
    colnames(E) <- labels
    cbind(Y, E)
  }

  do.call("rbind", by(X, X[,unitvar], innertask, lagvars, lags))
}

# Task: For a supplied list of variables (the list `gvars'),
#       make new columns in a dataset denoting growth rates.
#       You must supply `unitvar' which identifies the unit that's
#           repeatedly observed.
#       You must supply the name of the time variable `timevar'
#       and you must tell the time periods Q (vector is ok) over which
#       the growth rates are computed.
paneldata.growthrates <- function(X, unitvar, timevar, gvars, Q=1) {
  stopifnot(length(gvars)>=1)
  X <- X[order(X[,timevar]),]

  makegrowths <- function(x, q) {
    new <- rep(NA, length(x))
    for (t in (1+q):length(x)) {
      new[t] <- 100*((x[t]/x[t-q])-1)
    }
    return(new)
  }

  innertask <- function(Y, gvars, Q) {
    E <- labels <- NULL
    for (v in gvars) {
      for (q in Q) {
        E <- cbind(E, makegrowths(Y[,v], q))
      }
      labels <- c(labels, paste(v, ".g", Q, sep=""))
    }
    colnames(E) <- labels
    cbind(Y, E)
  }

  do.call("rbind", by(X, X[,unitvar], innertask, gvars, Q))
}

Here's a demo of using them:

# A simple panel dataset
A <- data.frame(year=rep(1980:1982,4),
                person=factor(sort(rep(1:4,3))),
                v1=round(rnorm(12),digits=2), v2=round(rnorm(12),digits=2))

# Demo of creating lags for both variables v1 and v2 --
paneldata.lags(A, "person", "year", c("v1","v2"), lags=1:2)
# Demo of creating growth rates for v2 w.r.t. 1 & 2 years ago --
paneldata.growthrates(A, "person", "year", "v2", Q=1:2)




Finally, I have a question. In a previous posting on this subject,
Gabor showed me that my code:

# Blast this function for all the values that A$person takes --
new <- NULL
for (f in levels(A$person)) {
  new <- rbind(new,
               make.additional.variables(subset(A, A$person==f),
                                         nlags=2, Q=1))
}
A <- new; rm(new)

can be replaced by one do.call() (as used above). It's awesome, but I
don't understand it! :-) Could someone please explain how and why this
works? I know by() and I see that when I do by(A,A$x), it gives me a
list containing as many entries as are levels of A$x. I couldn't think
of a way to force all this into one data frame; the best I could do
was to do for (f in levels (A$person)) {} as shown here. The two
functions above are using do.call() as Gabor used them, and they're
awesome, but I don't understand why they work! The man page ?do.call
was a bit too cryptic and I couldn't comprehend it. Where can I learn
this stuff?

-- 
Ajay Shah                                                   Consultant
ajayshah at mayin.org                      Department of Economic Affairs
http://www.mayin.org/ajayshah           Ministry of Finance, New Delhi




More information about the R-help mailing list