[Rd] creating lagged variables

Gabor Grothendieck ggrothendieck at gmail.com
Thu Dec 13 21:19:20 CET 2007


The problem is the representation.

If we transform it into a zoo time series, z, with one
series per column and one time point per row then we
can just merge the series with its lag.

> DF <- data.frame(id = c(1, 1, 1, 2, 2, 2), time = c(1, 2,
+ 3, 1, 2, 3), value = c(-0.56047565, -0.23017749, 1.55870831,
+ 0.07050839, 0.12928774, 1.71506499))
>
> library(zoo)
> z <- do.call(merge, by(DF, DF$id, function(x) zoo(x$value, x$time)))
> merge(z, lag(z, -1))
         1.z        2.z 1.lag(z, -1) 2.lag(z, -1)
1 -0.5604756 0.07050839           NA           NA
2 -0.2301775 0.12928774   -0.5604756   0.07050839
3  1.5587083 1.71506499   -0.2301775   0.12928774


On Dec 13, 2007 1:21 PM, Antonio, Fabio Di Narzo
<antonio.fabio at gmail.com> wrote:
> Hi all.
> I'm looking for robust ways of building lagged variables in a dataset
> with multiple individuals.
>
> Consider a dataset with variables like the following:
> ##
> set.seed(123)
> d <- data.frame(id = rep(1:2, each=3), time=rep(1:3, 2), value=rnorm(6))
> ##
> >d
>  id time       value
> 1  1    1 -0.56047565
> 2  1    2 -0.23017749
> 3  1    3  1.55870831
> 4  2    1  0.07050839
> 5  2    2  0.12928774
> 6  2    3  1.71506499
>
> I want to compute the lagged variable 'value(t-1)', taking subject id
> into account.
> My current effort produced the following:
> ##
> my_lag <- function(dt, varname, timevarname='time', lag=1) {
>        vname <- paste(varname, if(lag>0) '.' else '', lag, sep='')
>        timevar <- dt[[timevarname]]
>        dt[[vname]] <- dt[[varname]][match(timevar, timevar + lag)]
>        dt
> }
> lag_by <- function(dt, idvarname='id', ...)
>  do.call(rbind, by(dt, dt[[idvarname]], my_lag, ...))
> ##
> With the previous data I get:
>
> > lag_by(d, varname='value')
>    id time       value     value.1
> 1.1  1    1 -0.56047565          NA
> 1.2  1    2 -0.23017749 -0.56047565
> 1.3  1    3  1.55870831 -0.23017749
> 2.4  2    1  0.07050839          NA
> 2.5  2    2  0.12928774  0.07050839
> 2.6  2    3  1.71506499  0.12928774
>
> So that seems working. However, I was thinking if there is a
> smarter/cleaner/more robust way to do the job. For instance, with the
> above function I get dataframe rows re-ordering as a side-effect
> (anyway this is of no concern in my current analysis)...
> Any suggestion?
>
> All the bests,
> Fabio.
> --
> Antonio, Fabio Di Narzo
> Ph.D. student at
> Department of Statistical Sciences
> University of Bologna, Italy
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list