[R] Fill in missing times in a timeseries with NA

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 27 16:57:08 CEST 2010


On Wed, Oct 27, 2010 at 8:57 AM, lglew <l.glew at soton.ac.uk> wrote:
>
> Hi,
>
> I have a irregularly spaced time series dataset, which reads in from a .csv.
> I need to convert this to a regularly spaced time series by filling in
> missing rows of data with NAs.
>
> So my data, called NtuMot, looks like this (I've removed some of the
> additional rows for simplicity)....
> ELEID    date_time                       height                  slope
> 1        2009-06-24 00:00:00          150                      4.0
> 1        2009-06-24 01:00:00          175                      4.0
> 1        2009-06-24 02:00:00          180                      2.3
> 1        2009-06-24 03:00:00          200                      1.0
> 1        2009-06-24 06:00:00          201                      1.0
> 1        2009-06-24 07:00:00          202                      0.0
> 1        2009-06-24 08:00:00          202                      0.0
> 1        2009-06-24 09:00:00          202                      0.0
> 1        2009-06-24 10:00:00          202                      0.0
>
>
> I need to end up with this:
> ELEID    date_time                       height                  slope
>
> 1        2009-06-24 00:00:00         150                      4.0
> 1        2009-06-24 01:00:00         175                      4.0
> 1        2009-06-24 02:00:00         180                      2.3
> 1        2009-06-24 03:00:00         200                      1.0
> 1        2009-06-24 04:00:00          NA                       NA
> 1        2009-06-24 05:00:00          NA                       NA
> 1        2009-06-24 06:00:00          201                      1.0
> 1        2009-06-24 07:00:00          202                      0.0
> 1        2009-06-24 08:00:00          202                      0.0
> 1        2009-06-24 09:00:00          202                      0.0
> 1        2009-06-24 10:00:00          202                      0.0
>
> Any ideas much appreciated!
>

This will do it producing a new data frame:

grid.df <- data.frame(date_time = seq(DF[1, 2], DF[nrow(DF), 2], by = "hour"))
merge(DF, grid.df)

however, if you are dealing with irregular series you might find it
more convenient to use the zoo package:

library(zoo)
z <- zoo(DF[-2], DF[, 2])
g <- seq(start(z), end(z), by = "hour")
m <- merge(z, zoo(, g))
m

In the above we used this for data frame DF:

DF <- structure(list(ELEID = c(1, 1, 1, 1, 1, 1, 1, 1, 1),
date_time = structure(c(1245816000,
1245819600, 1245823200, 1245826800, 1245837600, 1245841200, 1245844800,
1245848400, 1245852000), class = c("POSIXt", "POSIXct"), tzone = ""),
    height = c(150, 175, 180, 200, 201, 202, 202, 202, 202),
    slope = c(4, 4, 2.3, 1, 1, 0, 0, 0, 0)), .Names = c("ELEID",
"date_time", "height", "slope"), row.names = c("2009-06-24 00:00:00",
"2009-06-24 01:00:00", "2009-06-24 02:00:00", "2009-06-24 03:00:00",
"2009-06-24 06:00:00", "2009-06-24 07:00:00", "2009-06-24 08:00:00",
"2009-06-24 09:00:00", "2009-06-24 10:00:00"), class = "data.frame")


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list