[R] Lag based on Date objects with non-consecutive values

Sam Albers tonightsthenight at gmail.com
Tue Mar 20 01:03:20 CET 2012


Hello R-ers,

I just wanted to update this post. I've made some progress on this but
am still not quite where I need to be. I feel like I am close so I
just wanted to share my work so far.

Thanks in advance!

Sam

On Mon, Mar 19, 2012 at 1:10 PM, Sam Albers <tonightsthenight at gmail.com> wrote:
> Hello all,
>
> I need to figure out a way to lag a variable in by a number of days
> without using the zoo package. I need to use a remote R connection
> that doesn't have the zoo package installed and is unwilling to do so.
> So that is, I want a function where I can specify the number of days
> to lag a variable against a Date formatted column. That is relatively
> easy to do. The problem arises when I don't have consecutive dates. I
> can't seem to figure out a way to insert an NA when there is
> non-consecutive date. So for example:
>
>
> ## A dataframe with non-consecutive dates
> set.seed(32)
> df1<-data.frame(
>           Date=seq(as.Date("1967-06-05","%Y-%m-%d"),by="day", length=5),
>           Dis1=rnorm(5, 1,10)
>           )
> df2<-data.frame(
>  Date=seq(as.Date("1967-07-05","%Y-%m-%d"),by="day", length=10),
>  Dis1=rnorm(5, 1,10)
>  )
>
> df <- rbind(df1,df2); df
>
> ## A function to lag the variable by a specified number of days
> lag.day <- function (lag.by, data) {
>  c(rep(NA,lag.by), head(data$Dis1, -lag.by))
> }
>
> ## Using the function
> df$lag1 <- lag.day(lag.by=1, data=df); df
> ## returns this data frame
>
>         Date      Dis1      lag1
> 1  1967-06-05  1.146405        NA
> 2  1967-06-06  9.732887  1.146405
> 3  1967-06-07 -9.279462  9.732887
> 4  1967-06-08  7.856646 -9.279462
> 5  1967-06-09  5.494370  7.856646
> 6  1967-06-15  5.070176  5.494370
> 7  1967-06-16  3.847314  5.070176
> 8  1967-06-17 -5.243094  3.847314
> 9  1967-06-18  9.396560 -5.243094
> 10 1967-06-19  4.112792  9.396560
>
>
> ## When really what I would like is something like this:
>
>         Date      Dis1      lag1
> 1  1967-06-05  1.146405        NA
> 2  1967-06-06  9.732887  1.146405
> 3  1967-06-07 -9.279462  9.732887
> 4  1967-06-08  7.856646 -9.279462
> 5  1967-06-09  5.494370  7.856646
> 6  1967-06-15  5.070176  NA
> 7  1967-06-16  3.847314  5.070176
> 8  1967-06-17 -5.243094  3.847314
> 9  1967-06-18  9.396560 -5.243094
> 10 1967-06-19  4.112792  9.396560

I've now gotten this far but have realized that my approach is flawed
because if I increase the lag.by value to anything great than 1, an NA
is no longer entered into the correct position. So here is my updated
effort:

lag.by <- function (data, lag.by) {
  tmp<-data.frame(
## Difference in days between dates
    diff=c(diff(data$Date), NA),
    lag.tmp=c(rep(NA,lag.by), head(data$Dis1, -lag.by))
    )
  ## Diff calculates difference to next row so all the difference
  ## values need to be lagged
  ifelse(c(rep(NA,lag.by), head(tmp$diff, -lag.by))<=1,tmp$lag.tmp,NA)
}


df$lag <- lag.by(df, lag.by=1)
df$lag2 <- lag.by(df, lag.by=2); df

         Date      Dis1               lag      lag2
1  1967-06-05  1.146405        NA        NA
2  1967-06-06  9.732887  1.146405        NA
3  1967-06-07 -9.279462  9.732887  1.146405
4  1967-06-08  7.856646 -9.279462  9.732887
5  1967-06-09  5.494370  7.856646 -9.279462
6  1967-06-15  5.070176        NA     7.856646 <- Need this to be a NA
7  1967-06-16  3.847314  5.070176        NA
8  1967-06-17 -5.243094  3.847314  5.070176
9  1967-06-18  9.396560 -5.243094  3.847314
10 1967-06-19  4.112792  9.396560 -5.243094

So, I should have NA's in the lag2 column at rows 6 and 7. Any help or
thoughts would be much appreciated here.



>
> So can anyone recommend a way (either using my function or any other
> approaches) that I might be able to consistently lag values based on a
> lag.by value and consecutive dates?
>
> Thanks so much in advance!
>
> Sam



More information about the R-help mailing list