[R] Comparison of two time series using R

Tim Churches tchur at optushome.com.au
Tue Jul 30 02:08:45 CEST 2002

We have two time series: the first is a series of weekly counts of 
isolates of RSV (respiratory syncytial virus) by pathology laboratories, 
and the second is a series of weekly counts of cases of bronchiolitis in 
young children presenting to hospital emergency departments. 
Bronchiolitis in young children is usually caused by RSV infection, and 
simple visual inspection reveals a very close correspondence between the 
two series, both of which show strong seasonality and also corresponding 
variation from year to year.

My question is how to approach the analysis of these data using R. Here 
is what we have done so far (guided by Diggle and MASS):

1) Create two time-series (ts) objects from the data, making sure the 
corresponding observations in the two ts are in fact contemporaneous.
2) Decompose each ts into seasonal, trend and remainder components using 
stl() and decompose().
3) Examine the cross-correlogram for the raw ts and the decomposed 
components using ccf() - this revealed that bronchiolitis cases were 
maximally cross-correlated with RSV isolates at a 3 week lag.
4) Examine periodograms of the raw ts and the pre-whitened data (the 
remainders) - most of the energy is in the week-to-week variation.
5) Calculate the cross-correlation between the remainders of the two 
series using a 3 week lag - it is about 0.55.

OK as far as it goes, but these results only obliquely shed light on the 
question we want to answer: "Can lab RSV isolate counts be used to 
predict the hospital bronchiolitis case-load a few weeks hence, and if 
so, how reliably?"

Stephen Morrell [Morrell S. Times Series (Box-Jenkins) Analysis. In: 
Kerr C, Taylor R, Heard G. /Handbook of Public Health Methods, /McGraw 
Hill, Sydney, 1998.] suggests the following approach (direct quote 
observing fair-use copyright provisions follows):

"In the first stage of analysis, the outcome and predictor series are 
pre-analysed tp identify the form of the transfer function. In the 
second stage the transfer function is identified and its residuals 
computed. Finally, an ARIMA model is fitted to the residuals to assess 
the adequacy of teh overall model. A ratio of U- and S-polynomials, 
U(B)/S(B), called impulse weights, is used to specify the effect of a 
unit change in the predictor series on teh outcome series. These weights 
are initally estimated by a cross-correlation function (CCF), which 
assess the relationship between the de=trended predictor series on the 
de=trended outcome series (with autocorrelation influences removed, 
called prewhitening)."

Is this a reasonable approach to our question? Hints on how to proceed 
are most welcome, and/or references to papers or texts which might 
render us a bit less clueless wrt this problem.


Tim C

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list