[R] fill data forward in data frame.

Petr Savicky savicky at cs.cas.cz
Thu Mar 1 22:57:00 CET 2012


On Thu, Mar 01, 2012 at 02:31:01PM -0700, Ben quant wrote:
> Hello,
> 
> My direct desire is a good (fast) way to fill values forward until there is
> another value then fill that value foward in the data xx (at the bottom of
> this email).  For example, from row 1 to row 45 should be NA (no change),
> but from row 46 row 136 the value should be 12649, and from row 137 to the
> next value should be 13039.00.  The last line of code is all you need for
> this part.
> 
> If you are so inclined, my goal is this: I want to create a weekly time
> series out of some data based on the report date. The report date is 'rd'
> below, and is the correct date for the time series. My idea (in part seen
> below) is to align rd and ua via the incorrect date (the time series date),
> then merge that using the report date (rd) and a daily series (so I capture
> all of the dates) of dates (dt). That gets the data in the right start
> period. I've done all of this so far below and it looks fine. Then I plan
> to roll all of those values forward to the next value (see question above),
> then I'll do something like this:
> 
> xx[weekdays(xx[,1]) == "Friday",]
> 
> ...to get a weekly series of Friday values. I'm thinking someone probably
> has a faster way of doing this. I have to do this many times, so speed is
> important. Thanks!
> 
> Here is what I have done so far:
> 
> dt <- seq(from =as.Date("2009-06-01"), to = Sys.Date(), by = "day")
> 
> > nms
> [1] "2009-06-30" "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30"
> "2010-09-30" "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30"
> [11] "2011-12-31"
> 
> > rd
> 2009-06-30   2009-09-30   2009-12-31   2010-03-31   2010-06-30
> 2010-09-30   2010-12-31   2011-03-31   2011-06-30   2011-09-30
> "2009-07-16" "2009-10-15" "2010-01-19" "2010-04-19" "2010-07-19"
> "2010-10-18" "2011-01-18" "2011-04-19" "2011-07-18" "2011-10-17"
> 2011-12-31
> "2012-01-19"
> 
> > ua
> 2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 2010-09-30
> 2010-12-31 2011-03-31 2011-06-30 2011-09-30 2011-12-31
> 12649.00   13039.00   13425.00   13731.00   14014.00   14389.00
> 14833.00   15095.00   15481.43   15846.43   16186.43
> 
> > x = merge(ua,rd,by='row.names')
> > names(x) = c('z.date','val','rt_date')
> > xx = merge(dt,x,by.y= 'rt_date',by.x=1,all.x=T)
> > xx
> x                          z.date   val
> 1   2009-06-01       <NA>    NA
> 2   2009-06-02       <NA>    NA
> 3   2009-06-03       <NA>    NA
> 4   2009-06-04       <NA>    NA
> 5   2009-06-05       <NA>    NA
> 
> ...ect....
> 
> 36  2009-07-06       <NA>    NA
> 37  2009-07-07       <NA>    NA
> 38  2009-07-08       <NA>    NA
> 39  2009-07-09       <NA>    NA
> 40  2009-07-10       <NA>    NA
> 41  2009-07-11       <NA>    NA
> 42  2009-07-12       <NA>    NA
> 43  2009-07-13       <NA>    NA
> 44  2009-07-14       <NA>    NA
> 45  2009-07-15       <NA>    NA
> 46  2009-07-16 2009-06-30 12649
> 47  2009-07-17       <NA>    NA
> 48  2009-07-18       <NA>    NA
> 49  2009-07-19       <NA>    NA
> 50  2009-07-20       <NA>    NA
> 51  2009-07-21       <NA>    NA
> 52  2009-07-22       <NA>    NA
> 53  2009-07-23       <NA>    NA
> 54  2009-07-24       <NA>    NA
> 55  2009-07-25       <NA>    NA
> 56  2009-07-26       <NA>    NA
> 57  2009-07-27       <NA>    NA
> 58  2009-07-28       <NA>    NA
> 
> ...ect....
> 
> 129  2009-10-07       <NA>       NA
> 130  2009-10-08       <NA>       NA
> 131  2009-10-09       <NA>       NA
> 132  2009-10-10       <NA>       NA
> 133  2009-10-11       <NA>       NA
> 134  2009-10-12       <NA>       NA
> 135  2009-10-13       <NA>       NA
> 136  2009-10-14       <NA>       NA
> 137  2009-10-15 2009-09-30 13039.00
> 138  2009-10-16       <NA>       NA
> 139  2009-10-17       <NA>       NA
> 140  2009-10-18       <NA>       NA
> 141  2009-10-19       <NA>       NA
> 142  2009-10-20       <NA>       NA
> 143  2009-10-21       <NA>       NA

Hi.

Try first the following simpler version.

  # an input vector
  x <- rep(NA, times=20)
  x[4] <- "A"
  x[9] <- "B"
  x[17] <- "C"

  # extending the values forward 
  values <- c(NA, x[!is.na(x)])
  ind <- cumsum(!is.na(x)) + 1
  y <- values[ind]

  # compare with the original
  cbind(x, y)

        x   y  
   [1,] NA  NA 
   [2,] NA  NA 
   [3,] NA  NA 
   [4,] "A" "A"
   [5,] NA  "A"
   [6,] NA  "A"
   [7,] NA  "A"
   [8,] NA  "A"
   [9,] "B" "B"
  [10,] NA  "B"
  [11,] NA  "B"
  [12,] NA  "B"
  [13,] NA  "B"
  [14,] NA  "B"
  [15,] NA  "B"
  [16,] NA  "B"
  [17,] "C" "C"
  [18,] NA  "C"
  [19,] NA  "C"
  [20,] NA  "C"

This could be applied directly to the last two columns of your
data frame "xx". However, it may be more natural to obtain the
vector "values" from the input data and not from their sparse
form, which is the data frame. Also, the logical vector !is.na(x)
is the same for the last two columns of your data frame, so
it may be computed only once.

Hope this helps.

Petr Savicky.



More information about the R-help mailing list