[R] lags of a variable, with a factor

Charles Berry ccberry at ucsd.edu
Sat Aug 24 18:54:17 CEST 2013


Jim Lemon <jim <at> bitwrit.com.au> writes:

> 
> On 08/24/2013 04:16 AM, Michael Friendly wrote:
> > For sequential analysis of sequences of events, I want to calculate a
> > series of lagged
> > versions of a (numeric or character) variable. The simple function below
> > does this,
> > but I can't see how to generalize this to the case where there is also a
> > factor variable
> > and I want to calculate lags separately for each level of the factor
> > (by). Can anyone help?
> > ...

[snip]

> >  >
> >
> Hi Michael,
> Maybe this will do it.
> 
> lags <- function(x, k=1, prefix='lag', by) {
>    if(missing(by)) {
>    n <- length(x)
>    res <- data.frame(lag0=x)
>    for (i in 1:k) {
>      res <- cbind(res, c(rep(NA, i), x[1:(n-i)]))
>    }
>    colnames(res) <- paste0(prefix, 0:k)
>    return(res)
>    }
>    else {
>     for(levl in levels(by)) {
>      nextlags<-lags(x[by==levl,],prefix=prefix)
>      rownames(nextlags)<-paste(levl,rownames(nextlags),sep=".")
>      if(exist(res)) res<-rbind(res,nextlags)
>      else res<-nextlags
>     }
>    }
> }
> 
> Jim


Untested? I get

> lags(mtcars$mpg,2)
   lag0 lag1 lag2
1  21.0   NA   NA
2  21.0 21.0   NA
3  22.8 21.0 21.0
4  21.4 22.8 21.0
5  18.7 21.4 22.8
6  18.1 18.7 21.4
7  14.3 18.1 18.7
[ ... ]

which looks ok and 


> lags(mtcars$mpg,2,by=factor(mtcars$cyl))
Error in x[by == levl, ] : incorrect number of dimensions
> 

Michael, try this:


lagframe <- function(x,k=1,prefix='lag',by){
    lag.one <- function(x) c(NA,head(x,-1))
    indx <- if (missing(by))
        lag.one(seq_along(x))
    else {
        spl.by <- split(seq_along(by),by)
        lag.spl.by <-
            lapply(spl.by, lag.one )
        unsplit(lag.spl.by,by)
    }
    res <- setNames(data.frame(x), paste0(prefix,"0") )
    for (i in 1:k) res[[ paste0(prefix,i) ]] <-
        res[[ paste0(prefix,i-1) ]][ indx ]
    
    res
}


> lags(mtcars$mpg,2)
   lag0 lag1 lag2
1  21.0   NA   NA
2  21.0 21.0   NA
3  22.8 21.0 21.0
4  21.4 22.8 21.0
5  18.7 21.4 22.8
[...]

> cbind( lagframe(mtcars$mpg,2,by=mtcars$cyl), cyl=mtcars$cyl)
   lag0 lag1 lag2 cyl
1  21.0   NA   NA   6
2  21.0 21.0   NA   6
3  22.8   NA   NA   4
4  21.4 21.0 21.0   6
5  18.7   NA   NA   8
6  18.1 21.4 21.0   6
7  14.3 18.7   NA   8
8  24.4 22.8   NA   4
9  22.8 24.4 22.8   4
10 19.2 18.1 21.4   6
11 17.8 19.2 18.1   6
12 16.4 14.3 18.7   8
[...]



More information about the R-help mailing list