[R] Systematic resampling (in sequential Monte Carlo)

Wed Jul 29 20:32:14 CEST 2009

Dear all,

Here is a little coding problem. It falls in the category of "how can I do
this efficiently?" rather than "how can I do this?" (I know how to do it
inefficiently). So, if you want to take the challenge, keep reading, otherwise
just skip to the next post - I won't be offended by that ;-)

I am coding a particle filter and I want to use systematic resampling to
"regenerate" particles. Basically, what this means is the following.

1. Start with a vector of "labels", x, say, of length N and a vector of
   probabilities, w, say - w[i] is the probability attached to x[i]. 

2. Draw N equally spaced random number in (0,1) as u = (1 : N + runif(1)) / N.

3. Define a new sample of "labels", y, say, by inverting the cdf defined by
   the probabilities w. That is, set 
   y[i] = x[k] if cumsum(w)[k-1] <= u[i] < cumsum(w)[k],  i=1, ..., N.

The following is a piece of R code that implements the procedure.

> ### Systematic resampling
> set.seed(2)
> x <- LETTERS[1 : 6] # labels
> w <- rexp(6)
> w <- w / sum(w) # probabilities
> W <- c(0, cumsum(w)) # cdf
> u <- (1 : 6 + runif(1)) / 6
> indNew <- sapply(seq_along(u),
+                  function(i) (sum(W[i] <= u & u < W[i + 1])))
> (y <- rep(x, indNew))
[1] "A" "B" "D" "D" "F"
> ## graphically...
> plot(stepfun(seq_along(u), W), xaxt = "n")
> axis(1, at = seq_along(u), x)
> abline(h = u, lty = "dotted")

The question is the following. I believe the way I compute "newInd" is of
order N^2. Is there a smart way of doing it which is of order N? (I know there
is one, I just cannot find it...)

Thanks in advance,
Giovanni