[R] Data Synchronization -- detecting time differences in multi-source data

Ralf B ralf.bierig at gmail.com
Mon Apr 12 19:50:06 CEST 2010


Hi R enthusiasts,

I am dealing with logging data from different sources that contain
data from user activities. The data is all timelined with one column
containing Epoch time and two columns containing data (x and y
coordinates of mouse movements) = three columns for each source. I
have up to 10 such sources and with 100000s of log entries.

Here the header:

timestamp1, x1, y1, timestamp2, x2, y2, .....

Since data is recorded from different sources, I have time differences
in the measurements between source 1 and source 2. Sometimes these
time differences are regular (e.g. source 1 is always 10 ms off source
2) but they can also be dynamic (e.g. based on some network latency
issues, differences can increase or decrease at any time). The x and y
value measurements always match, but since they are screen coordinates
they may repeat in various places. Some sources start earlier than
others, which means time lined entries do not match on each line.

I am looking for a pointer to some general statistical methods that
allows me to automatically detect time differences in such data sets.
Methods that detect blocks of measurements across sources and compare
their time line and flag those cases where they divert. Which field of
stats deals with this? What R packages are specialized on such
problems?

Thanks a lot,
Ralf



More information about the R-help mailing list