[R] qcc package & syndromic surveillance (multivar CUSUM?)
spencer.graves at pdf.com
Tue Jul 27 03:07:17 CEST 2004
What do you think is most plausible: an abrupt jump or a gradual
drift? To detect an abrupt jump from a null hypothesis H0 to an
alternative H1, the tool of choice seems to be a cumulative sum (CUSUM)
of log(likelihood ratio). If H0 and H1 are normal distributions with
equal variances, this general rule specializes to a one-sided cumulative
sum of (y[t]-mu.bar), where mu.bar is the average of the means under H0
However, to detect a gradual drift modeled as a random walk, the
theory says that the best tool is something like an exponentially
weighted moving average (EWMA).
For monitor design, I like to write the following:
joint = observation * prior = posterior * predictive
f(y & mu) = f(y | mu)*f(mu) = f(mu | y)*f(y).
When each observation arrives, I test to see if it is consistent with
the predictive distribution f(y). If it is not consistent, I report a
potential problem. If it is consistent, I incorporate it into the EWMA
[or CUSUM], as described by the posterior f(mu | y). For more
information on this, see "Bayes' Rule of Information" and other
"foundations of monitoring" reports downloadable from
"www.prodsyse.com". This kind if use of the predictive distribution is
discussed in the West and Harrison (1999) book cited in the "Bayes'
Rule" paper, and a Poisson EWMA is derived on p. 5.
What you use, of course, depends on the events you hope to
capture. For your applications, I might consider running separate
monitors on each condition-hospital pair plus monitors on the totals for
each hospital and for each condition, plus one for overall. I might use
the qcc package to calibrate my thresholds, but do the daily
computations in some data base system.
Selecting thresholds is not easy, in part because the assumptions
you make for monitor design will never hold exactly in practice. The
result of this is that any thresholds you compute based purely on theory
will be wrong. However, if you tune your thresholds based on the years
of historical data you have, you should be on safer ground. This theory
should get you close to an optimal arrangement. I think I would use
quite loose thresholds for the hospital-condition (interaction)
monitors, thighter thresholds for the condition and hospital totals, and
the tightest threshold for the overall. Monitors on specific conditions
should be sensitive to epidemics or to an effective biological warfare
terrorist attack; if this is your concern, a CUSUM might be best.
Monitors on specific hospitals should be sensitive to changes in the
competence of local staff (suggesting a preference for an EWMA) or to a
sudden local outbreak of something (suggesting a CUSUM). Monitors on
condition-hospital pairs might be sensitive to local changes in
preferred diagnoses. I would run Cusums or EWMAs but not both: Either
will catch conditions most quickly caught by the other, with possible a
little longer delay.
hope this helps. spencer graves
adiamond at fas.harvard.edu wrote:
>Dear R Community:
>I am working on a public health early warning system, and
>I see that the qcc package allows for CUSUM and other statistical quality tests
>but I am not sure if my project is a good match for qcc functions as written.
>Any advice you may have is very much appreciated.
>I have four years worth of daily counts of emergency room admissions for
>different conditions (e.g. respiratory, neurologic, etc) from several local
>hospitals. the data looks like this...
> Respiratory Neuro ...
>Hospital A: 10 12
>. . .
>. . .
>Hospital F: 7 14
>DAY 2 Respiratory Neuro ...
>Hospital A: 10 12
>. . .
>. . .
>Hospital F: 7 14 ...
>and my goal is to do a kind of multivariate quality control test (without
>fitting a GLM), that would run each day after the data is updated and be able
>to answer the question:
>"Has there been a significant variation in the central tendency of the data?"
>An analogous problem would be detecting the early signs of a shift in global
>trading patterns by examining stock market indexes in different countries
>around the world, updating and testing the data each business day.
>R-help at stat.math.ethz.ch mailing list
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help