[R] lag a data.frame column?

Achim Zeileis Achim.Zeileis at wu-wien.ac.at
Wed Sep 9 20:09:40 CEST 2009


On Wed, 9 Sep 2009, Mark Knecht wrote:

> Sometimes it's the simple things...
>
> Why doesn't this lag X$x by 3 and place it in X$x1?

It does.

> (i.e. - Na's in the first 3 rows and then values showing up...)

Because this is not how the "ts" class handles lags.

What happens is that X$x is transformed to "ts"
   as.ts(X$x)
which is now a regular series with frequency 1 starting at 1 and ending at 
10. If you apply lag(), the data is not modified at all, just the time 
index is shifted
   lag(as.ts(X$x), 3)
Thus it does not create any NAs or - even worse - throws away observations 
(which is not necessary because the frequency time series is known and the 
time index can be extended).

BTW: You almost surely wanted lag(..., -3). Personally, I also don't find 
this intuitive but it's how things are (as documented on the man page).

> The help page does talk about time series. If lag doesn't work on
> data.frame columns then what would be the right function to use to lag
> by a variable amount?

That depends what you want to do. If your data really is a time series, 
then using a time series class (such as "ts", or "zoo" etc.) would 
probably be preferable. This would probably also get you further benefits 
for data processing.

If for some reason you can't do that, it shouldn't be too difficult to 
write a function that does what you want for your personal use
   mylag <- function(x, k) c(rep(NA, k), x[1:(length(x)-k)])
which assumes that k is a positive integer and length(x) > k.

Best,
Z




More information about the R-help mailing list