[R] moving average with gaps in time series

Carl Witthoft carl at witthoft.com
Thu Dec 16 23:29:01 CET 2010


I'm not completely sure what your test setup does, but wouldn't it be 
simpler to take a moving average at every point, but set the window 
dynamically to be over the previous hour?

In pseudocode, for the j-th sample:

meanval[j] <- mean(Value[ DateTime[(DateTime-DateTime[j])>0 & 
(Datetime-DateTime[j]<3600)] ] )

where I'm just letting DateTime values be in seconds.

If you really want to reject all averages that would have 'looked back' 
over a missing sample,  I'd play with rle() to find the bad points and 
use that to reduce the meanval vector I described above.

Carl


********quote****
I have a time series with interval of 2.5 minutes, or 24 observations 
per hour. I am trying to find a 1 hr moving average, looking backward, 
so that moving average at n = mean(n-23 : n)

The time series has about 1.5 million rows, with occasional gaps due to 
poor data quality. I only want to take a 1 hour moving average for those 
periods that are complete, i.e. have 24 observations in the previous hour.

The data is in 3 columns

Value DateTime interval

For example:
Value <- rnorm (100, 50, 3) #my data has 1.5 million rows; using 100 
here for simple example
DateTime <- seq(from = 915148800, to=915156150, by =150) #time steps 
1:50 at 150 second intervals
DateTime [51] <- 915156450 #skip one time step; DateTime[52:100] <- 
seq(from = 915156600, to =915163800, by = 150) #resume time steps of 150 
seconds

x <- cbind (Value, DateTime)
x <- as.data.frame(x)
x$DateTime <-as.POSIXct(x$DateTime, origin="1970-01-01", tz="GMT") x1 <- 
x[-c(1:23), ] #trimming x to create direct comparison of DateTimes in x 
and x1
x[,3] <- difftime(x1[,2], x[,2], units="mins") #ignore warning message 
colnames(x) [3] <- "interval"
x[24:nrow(x),3] <- x[1:(nrow(x)-23),3] #set interval to be the number of 
minutes between n-23 and n.

         #57.5 indicates no gaps in the previous hour up to and including n.

# >57.5 indicates a gap in the previous n-23 rows.
x[1:23,3] <- 0/0 #NaN assigned to first 23 rows so as not to take 
average of first hour.
#as expected, row 51: 73 indicates a gap, i.e. interval > 57.5

index <- which (x [,3] == 57.5) #which rows have no gaps in previous hour

#loop to calculate 1 hour moving average, only for periods which are 
complete
for (i in 1:length(index)) {

     x [index[i],4] <- mean(x[index[i]-23,1] : x[index[i],1])   }

#This loop works on this simple example; but this takes VERY long time 
to run on x with 1.5 million rows. Over 1 hour running and still not 
complete. I also tried increasing memory.limit to 4095, but still very 
slow.

Any suggestions to make this run faster? I thought about using the lag 
function but could not get it to work



More information about the R-help mailing list