[R] importing and merging many time series

Anton Lebedevich mabrek at gmail.com
Sun Apr 7 14:40:33 CEST 2013


Hello.

I've got many (5-20k) files with time series in a text format like this:

1359635460      2.006747
1359635520      1.886745
1359635580      3.066988
1359635640      3.633578
1359635700      2.140082
1359635760      2.033564
1359635820      1.980123
1359635880      2.060131
1359635940      2.113416
1359636000      2.440172

First field is a unix timestamp, second is a float number. Its a text
export of http://graphite.readthedocs.org/en/latest/whisper.html
databases. Time series could have different resolutions, start/end
times, and possibly gaps inside.

Current way of importing them:

read.file <- function(file.name) {
  read.zoo(
    file.name,
    na.strings="None",
    colClasses=c("integer", "numeric"),
    col.names=c("time", basename(file.name)),
    FUN=function(t) {as.POSIXct(t, origin="1970-01-01 00:00.00", tz="UTC")},
    drop=FALSE)
}

load.metrics <- function(path=".") {
  do.call(merge.zoo, lapply(list.files(path, full.names=TRUE), read.file))
}

It works for 6k time series with 2k points in each, but fails with out
of memory error on 16Gb box when I try to import 10k time series with
10k points.

I've tried to make merging incremental by using Reduce but import speed
became unacceptable:

load.metrics <- function(path=".") {
  Reduce(
    function(a, b) {
      if (class(a) == "character") {
        a <- read.file(a)
      }
      merge.zoo(a, read.file(b))
    },
    list.files(path, full.names=TRUE))
}

Is there faster and less memory consuming way to import and merge a lot
of time series?

Regards,
Anton Lebedevich.



More information about the R-help mailing list