Running Median and Mean

David Brahm brahm@alum.mit.edu
Mon, 16 Sep 2002 11:09:30 -0400


R gurus,

On Aug 20, 2002, I asked in R-help about calculating a running 5-day median on
a large matrix.  Thanks to Martin Maechler <maechler@stat.math.ethz.ch> and Ray
Brownrigg <Ray.Brownrigg@mcs.vuw.ac.nz> for responding.

I ended up writing C code (and an R interface) to do it, which is about 1000x
faster than the naive method!  (72s became .09s on a 223 x 520 matrix).  I
added a running mean function too, which is very simple and is about 15x
faster than apply(..., filter, ...).  I've packaged the code as
  <http://cran.r-project.org/src/contrib/Devel/runStat_1.1.tar.gz>

The key insight was that it *is* in fact a good idea to calculate the ranks of
the 5 data points, because then it is easy to calculate how those ranks
*change* as you step through the column.  For example, suppose the last few
points in a column look like:
   data=  20 25 18 14 78 55 29
   rank=      3  2  1  5  4
where I have ranked the 5 points prior to the last one.  The median is 25 (the
point with rank 3).  Stepping to the left:
   data=  20 25 18 14 78 55 29
   rank=   3  4  2  1  5
For the 4 points that overlap, their ranks remain the same, except they are
incremented by one for points >=20, and decremented by one for points >55.  The
rank of the new point (20) is just 5 minus the number of increments done.

The code is copied below for your amusement.  Suggestions welcome.  Note I have
not dealt with NA's, even values of N, row-wise medians, or non-lagged medians
(i.e. I put the median of points 1:5 into position 6, not position 5).  It
would be great if somebody (Martin?) wanted to incorporate these functions into
an existing package or even "base"; otherwise I'll post the final product on
CRAN/src/contrib.
                              -- David Brahm (brahm@alum.mit.edu)

############### C Code: ##################

#include <R.h>

void runMedian(double *m, int *d, int *N) {
  int i, j, k, a, R=(*N+1)/2, *r=(int*) R_alloc(*N, sizeof(int));
  double old, new, *x;

  for (j=1; j <= d[1]; j++) {	                        /* Loop over columns */
    x = m + j*d[0] - 1 - *N;                         /* x[*N] = m[nrow(m),j] */

    for (i=0; i < *N; i++) {	              /* Calculate initial ranks "r" */
      r[i] = 1;	                              /* r higher if larger or later */
      for (k=0;   k < i;  k++) if (x[i] >= x[k]) r[i]++;
      for (k=i+1; k < *N; k++) if (x[i] >  x[k]) r[i]++;
      if (r[i]==R) x[*N] = x[i];            /* If rank=R, this is the median */
    }

    for (i=d[0]-1; i > *N; i--) {                      /* Now x[*N] = m[i,j] */
      x--;  old=x[*N];  new=x[0];  a=*N;              /* a = rank of new guy */
      for (k=*N-1; k > 0; k--) {	  /* Recalculate each element's rank */
	r[k] = r[k-1];		         /* Shift previous iteration's ranks */
	if (x[k] >= new) {r[k]++; a--;}	    /* Are we adding a lower number? */
	if (x[k] >  old) {r[k]--;}       /* Are we dropping a higher number? */
	if (r[k]==R) x[*N] = x[k];          /* If rank=R, this is the median */
      }
      r[0] = a;
      if (a==R) x[*N] = new;	               /* Is the new guy the median? */
    }
  }
}


void runMean(double *m, int *d, int *N) {
  int i, j, k;
  double new, *x;

  for (j=1; j <= d[1]; j++) {	                        /* Loop over columns */
    x = m + j*d[0] - 1 - *N;
    for (i=d[0]; i > *N; i--) {                            /* x[*N] = m[i,j] */
      for (k=0, new=0; k < *N; k++) new += x[k];
      x[*N] = new / *N;  x--;
    }
  }
}

############### R Code: ##################

runMedian <- function(m, N=5) {
  if (!(N %% 2)) stop("N should be odd.")
  d <- od <- dim(m)
  if (!(nd <- length(d))) d <- c(length(m), 1)               # Vector -> matrix
  if (nd > 2) d <- c(d[1], prod(d[-1]))                      # Array  -> matrix
  if (d[1] <= N) {m[] <- NA; return(m)}
  m <- array(.C("runMedian", m=as.real(m), as.integer(d), as.integer(N))$m, d)
  m[1:N, ] <- NA
  if (nd==2) m else if (!nd) m[ ,1] else array(m, od)     # Matrix/Vector/Array
}

runMean <- function(m, N=21) {
  d <- od <- dim(m)
  if (!(nd <- length(d))) d <- c(length(m), 1)               # Vector -> matrix
  if (nd > 2) d <- c(d[1], prod(d[-1]))                      # Array  -> matrix
  if (d[1] <= N) {m[] <- NA; return(m)}
  m <- array(.C("runMean", m=as.real(m), as.integer(d), as.integer(N))$m, d)
  m[1:N, ] <- NA
  if (nd==2) m else if (!nd) m[ ,1] else array(m, od)     # Matrix/Vector/Array
}
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._