[R] Finding periods, sort of.

Fri Sep 5 03:54:53 CEST 2003

A student of mine has 110 similarly structured multivariate time
series, and we're interested in methods that are practical for
thousands of them.

Basically each series describes a series of musical notes, and certain
properties of these notes are recorded.  The same set of properties
is recorded for each series.

The event times are irregular.  The number of events and the event
times are different for each series.

These are non-periodic series.  However, when you plot them, they
*look* repetitive.  That is, you have zig-zaggy bands going up and
down, and while they don't go up and down with a regular period,
the cycles do seem to have vaguely similar shapes, especially if
you smooth them.

R has been a *great* tool for exploring this stuff.  For example,
I concatenated a selection of the series and used
quantile(notes, c(0.34,0.66)) to find suitable cutoff points.
Then

library(vcd)

tern <- function (data, L = 65, U = 73, bw = 20) {
    f <- data$Note
    v <- data$Volume

    nf <- length(f)

    lo <- cumsum(v * (f < L))
    lo <- lo[(bw+1):nf] - lo[1:(nf-bw)]

    md <- cumsum(v * (f >= L & f <= U))
    md <- md[(bw+1):nf] - md[1:(nf-bw)]

    hi <- cumsum(v * (f > U))
    hi <- hi[(bw+1):nf] - hi[1:(nf-bw)]

    cv <- cumsum(v)
    cv <- cv[(bw+1):nf] - cv[1:(nf-bw)]

    ternaryplot(cbind(lo=lo/cv, md=md/cv, hi=hi/cv), lty=1)
}

produces different-looking ternary diagrams for different pieces,
which is good, because my student and I are looking for ways to
disriminate pieces.   Of course, these diagrams, by design, leave
out the time element entirely.

(I do hope that nobody has patented the idea of using ternary diagrams
to display the distribution of high, medium, and low notes in music.)

One thing we've done is to smooth fairly heavily and then (in effect)
count median-crossings to determine the number of (half-)"cycles".

But there has to be a better way to estimate the repetitiveness of
a time series than that.  I was wondering if maybe there was a way
to do a maximum likelihood estimate of seasonality using HoltWinters
as a building block, but I'm out of my depth.