[R] Imputing missing values in time series

Gabor Grothendieck ggrothendieck at gmail.com
Wed Nov 7 21:22:22 CET 2007


Here is na.locf both operating on x and on a zoo variable
compared to the others:

> set.seed(1)
> x = 1:1e5
> x[sample(1:1e5, 10000)] = NA
> system.time(z2<-locf.iverson2(x))
   user  system elapsed
   0.05    0.00    0.05
> system.time(z1<-locf.iverson(x))
   user  system elapsed
   0.11    0.00    0.11
> system.time(z3<-locf.sfear(x))
   user  system elapsed
   1.31    0.00    1.33
>
> library(zoo)
> system.time(z4 <- na.locf(x))
   user  system elapsed
   0.03    0.00    0.03
>
> z <- zoo(x)
> system.time(z5 <- na.locf(z))
   user  system elapsed
   0.04    0.00    0.05
>


On Jun 22, 2007 4:29 PM, Horace Tso <Horace.Tso at pgn.com> wrote:
> Thanks to Mark and Erik for different versions of locf, also Erik's pointer to archive where I found another function due to Simon Fear. I haven't tested the zoo locf function. The following shows their performance. Interestingly, Erik's use of a while loop is the fastest.
>
> HT.
>
> x = 1:1e5
> x[sample(1:1e5, 10000)] = NA
>
> >system.time(z2<-locf.iverson2(x))
>   user  system elapsed
>   0.07    0.00    0.06
> > system.time(z1<-locf.iverson(x))
>   user  system elapsed
>   0.11    0.00    0.11
> > system.time(z3<-locf.sfear(x))
>   user  system elapsed
>   1.13    0.00    1.12
>
> ==================================================
> # Due to Erik Iverson
> locf.iverson2 = function(x) {
>  while(any(is.na(x))) {
>    x[is.na(x)] <- x[which(is.na(x))-1]
>  }
>  x
> }
>
> # Due to Simon Fear (Fri Nov 14 17:28:57 2003)
> locf.sfear = function(x) {
>  assign("stored.value", x[1], envir=.GlobalEnv)
>  sapply(x, function(x) {
>    if(is.na(x))
>      stored.value
>    else {
>      assign("stored.value", x, envir=.GlobalEnv)
>      x
>    }})
> }
>
> # Due to Erik Iverson
> locf.iverson = function(x, unkn=-1) {
>  x[is.na(x)] = unkn  #something that is not a possible price
>  run = rle(x)
>  run$values[run$values==unkn] = run$values[which(run$values==unkn)-1]
>  inverse.rle(run)
> }
>
>
> >>> "Horace Tso" <Horace.Tso at pgn.com> 6/22/2007 12:21 PM >>>
>
> Mark, thanks for the tips. I thought you financial folks must have run into things like these before. Just wonder why this problem wasn't asked more often on this list.
>
> H.
>
>
> >>> "Leeds, Mark (IED)" <Mark.Leeds at morganstanley.com> 6/22/2007 12:16 PM >>>
> I have a function that does this type of thing but it works off a pure
> vector so it wouldn have to be modified.
> If you make your object a zoo object, the that object has many functions
> associated with it and na.locf would
> Do what you need, I think.
>
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Erik Iverson
> Sent: Friday, June 22, 2007 3:02 PM
> To: Horace Tso
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Imputing missing values in time series
>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
>  test
>  [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is
> > not traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006     6.28
> > 11/2/2006     6.58
> > 11/3/2006     6.586
> > 11/4/2006     6.716
> > 11/5/2006     NA
> > 11/6/2006     NA
> > 11/7/2006     6.262
> > 11/8/2006     6.27
> > 11/9/2006     6.696
> > 11/10/2006    6.729
> > 11/11/2006    6.487
> > 11/12/2006    NA
> > 11/13/2006    NA
> > 11/14/2006    6.725
> > 11/15/2006    6.844
> > 11/16/2006    6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
>
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> --------------------------------------------------------
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list