[R] qcc package & syndromic surveillance (multivar CUSUM?)

Tue Jul 27 03:07:17 CEST 2004

      What do you think is most plausible:  an abrupt jump or a gradual 
drift?  To detect an abrupt jump from a null hypothesis H0 to an 
alternative H1, the tool of choice seems to be a cumulative sum (CUSUM) 
of log(likelihood ratio).  If H0 and H1 are normal distributions with 
equal variances, this general rule specializes to a one-sided cumulative 
sum of (y[t]-mu.bar), where mu.bar is the average of the means under H0 
and H1. 

      However, to detect a gradual drift modeled as a random walk, the 
theory says that the best tool is something like an exponentially 
weighted moving average (EWMA). 

      For monitor design, I like to write the following: 

      joint = observation * prior = posterior * predictive

      f(y & mu) = f(y | mu)*f(mu) = f(mu | y)*f(y). 

When each observation arrives, I test to see if it is consistent with 
the predictive distribution f(y).  If it is not consistent, I report a 
potential problem.  If it is consistent, I incorporate it into the EWMA 
[or CUSUM], as described by the posterior f(mu | y).  For more 
information on this, see "Bayes' Rule of Information" and other 
"foundations of monitoring" reports downloadable from 
"www.prodsyse.com".  This kind if use of the predictive distribution is 
discussed in the West and Harrison (1999) book cited in the "Bayes' 
Rule" paper, and a Poisson EWMA is derived on p. 5. 

      What you use, of course, depends on the events you hope to 
capture.  For your applications, I might consider running separate 
monitors on each condition-hospital pair plus monitors on the totals for 
each hospital and for each condition, plus one for overall.  I might use 
the qcc package to calibrate my thresholds, but do the daily 
computations in some data base system. 

      Selecting thresholds is not easy, in part because the assumptions 
you make for monitor design will never hold exactly in practice.  The 
result of this is that any thresholds you compute based purely on theory 
will be wrong.  However, if you tune your thresholds based on the years 
of historical data you have, you should be on safer ground.  This theory 
should get you close to an optimal arrangement.  I think I would use 
quite loose thresholds for the hospital-condition (interaction) 
monitors, thighter thresholds for the condition and hospital totals, and 
the tightest threshold for the overall.  Monitors on specific conditions 
should be sensitive to epidemics or to an effective biological warfare 
terrorist attack;  if this is your concern, a CUSUM might be best.  
Monitors on specific hospitals should be sensitive to changes in the 
competence of local staff (suggesting a preference for an EWMA) or to a 
sudden local outbreak of something (suggesting a CUSUM).  Monitors on 
condition-hospital pairs might be sensitive to local changes in 
preferred diagnoses.  I would run Cusums or EWMAs but not both:  Either 
will catch conditions most quickly caught by the other, with possible a 
little longer delay. 

      hope this helps.  spencer graves

adiamond at fas.harvard.edu wrote:

>Dear R Community:
>
>I am working on a public health early warning system, and 
>I see that the qcc package allows for CUSUM and other statistical quality tests
>but I am not sure if my project is a good match for qcc functions as written.
>Any advice you may have is very much appreciated.
>
>I have four years worth of daily counts of emergency room admissions for 
>different conditions (e.g. respiratory, neurologic, etc) from several local 
>hospitals.  the data looks like this...
>
>DAY 1
>              Respiratory     Neuro   ...
>Hospital A:       10		12
>.                  .             .
>.                  .             .
>Hospital F:        7            14
>
>
>DAY 2         Respiratory     Neuro   ...
>Hospital A:       10		12
>.                  .             .
>.                  .             .
>Hospital F:        7            14    ...
>
>etc.,
>
>and my goal is to do a kind of multivariate quality control test (without 
>fitting a GLM), that would run each day after the data is updated and be able 
>to answer the question: 
>"Has there been a significant variation in the central tendency of the data?"
>
>An analogous problem would be detecting the early signs of a shift in global 
>trading patterns by examining stock market indexes in different countries 
>around the world, updating and testing the data each business day. 
>
>Thank you, 
>
>Alexis Diamond
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>  
>