[R] Time series data with dropouts/gaps

Tue Oct 26 06:28:11 CEST 2010

I have time-series data from a pair of inexpensive self-logging 3-axis accelerometers (http://www.gcdataconcepts.com/xlr8r-1.html).  Since I'm not sure of the vibration/shock spectrum I'm measuring, for my initial sensor characterization run the units were mounted together with the sample rate set to the maximum of 640 samples/sec.

Unfortunately, at this sample rate there are significant data dropouts at various scales (a phenomenon not present at data rates of 160 Hz and below):

1. Approximately every 20ms, a few samples are dropped (believed to be due to internal buffer wrapping).

2. Approximately every 200ms, about 50 samples are dropped (believed to be due to flash write times).

3. At seemingly random intervals, a sample will appear with an out-of-order timestamp (vendor is diagnosing).

Initially, I'm trying to answer the following questions:

A. How well do the 2 units compare?  (Calibration, time-base drift, etc.)

B. Can I use a lower sample rate?  (What is the observed spectrum?)

I started attacking the problem in Python (numpy/scipy), where I've done lots of prior time-series sensor data analysis.  Unfortunately, the gaps have made direct use of the data futile, and I found I was spending all my time manipulating Python lists and numpy vectors rather than finding answers.

I hope R can help calm my sea of unruly data.  I'm presently working my way through the abundant R references (tutorials, wiki, etc.), but I was hoping to find pointers here to help me become productive sooner rather than later.

Here's my present brute-force plan of attack:

- Load both data sets (in CSV format).  Each data element is a timestamp + 3-axis acceleration.
- Determine timebase offset: The unit clocks don't match perfectly, and the units were started at slightly different times, so I expect to correlate common events in the data.
- Find all overlapping data clusters (between superset of gaps).
- See if I have enough data to perform spectral analysis.  I'd like to analyze all clusters together, but I suspect I may have to analyze them independently, then combine the results.

Thoughts?  Hints?

I suspect I may need a few good smacks with a clue-by-four to get rolling...  I've been spoiled by 25 years of working with high-quality sensors that had only occasional single-sample dropouts that were easily filled.

TIA,

-BobC