[R] How to speed up grouping time series, help please

Den Alpin den.alpin at gmail.com
Thu Apr 7 15:09:19 CEST 2011


I found a faster implementation (by an order of magnitude from my
tests) than the one using xts, split, merge (from Joshua).
I report the two fastest solution below with code to generate a test
case; some work still to be done for columns order and naming,
Test case has grown from my previous post to get a more realistic timing.

Any comment or idea to further speed up multivariate time series
creation with classes xts or timeSeries starting from a data.frame
like the one reported here is welcome.

Best regards,
Den


a data.frame example (code below to generate it)

 ID                DATE     VALUE
14  3 2000-01-01 00:00:03 0.5726334
4   1 2000-01-01 00:00:03 0.8830174
1   1 2000-01-01 00:00:00 0.2875775
15  3 2000-01-01 00:00:04 0.1029247
11  3 2000-01-01 00:00:00 0.9568333
9   2 2000-01-01 00:00:03 0.5514350
7   2 2000-01-01 00:00:01 0.5281055
6   2 2000-01-01 00:00:00 0.0455565
12  3 2000-01-01 00:00:01 0.4533342
8   2 2000-01-01 00:00:02 0.8924190
3   1 2000-01-01 00:00:02 0.4089769
13  3 2000-01-01 00:00:02 0.6775706

And I want to get a timeSeries object or xts object like this:

                          1         2         3
2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
2000-01-01 00:00:01        NA 0.5281055 0.4533342
2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
2000-01-01 00:00:04        NA        NA 0.1029247

# CODE:

set.seed(123)
# set N to 5 to reproduce above data.frame
N <- 1000
# set K to 3 to reproduce above data.frame
K <- 10
X <- data.frame(
 ID = rep(1:K, each = N),
 DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), K)),
 VALUE = runif(N*K), stringsAsFactors = FALSE)
X <- X[sample(1:(N*K), N*K),]
X <- X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),]
str(X)


xtsSplit <- function(x)
{
 library(xts)
 x <- xts(x[,c("ID","VALUE")], as.POSIXct(x[,"DATE"]))
 return(do.call(merge, split(x$VALUE,x$ID)))
}
xtsSplitTime <- replicate(50,
 system.time(xtsSplit(X))[[1]])
median(xtsSplitTime)

xtsReshape <- function(x)
{
 library(xts)
 x <- reshape(x, idvar = "DATE", timevar = "ID", direction = "wide")
 x <- xts(x[,-1], as.POSIXct(x[,1]))
 return(x)
}
xtsReshapeTime <- replicate(50,
 system.time(xtsReshape(X))[[1]])
median(xtsReshapeTime)



More information about the R-help mailing list