[R] matching a given sd range

Ray Brownrigg Ray.Brownrigg at mcs.vuw.ac.nz
Sun Apr 2 07:56:22 CEST 2006


> Date: Fri, 31 Mar 2006 14:44:05 -0800 (PST)
> From: "Fred J." <phddas at yahoo.com>
>  
>  given a numeric array (a sequence of reals), I am interested in
>  finding the subsets of sequences (each with start and end index) which match a given sd range.
>  
>  I read the docs on match and which and the "see also" but could not come up with a way. I could loop with a stepping window over the sequence but that would be limited to a fixed size window, I guess I as well loop through all window sizes from 1:length(data) but that is not elegant.  any hints towards a solution?
>  
Presumably your "array" is a vector, in which case you can use
vectorisation to eliminate one of the loops.  The following function
produces a matrix in upper triangular form for which res[i, j] is the
variance of the sequence dat[i:j] (where dat is your "array").  It is
(approximately) two orders of magnitude faster than the double loop,
but doesn't produce exactly the same numbers because of numerical
rounding because it uses the formula:  variance = (sum(xi^2) -
n*xbar^2)/(n - 1)

Now all you have to do is test this matrix against the range of
variances you want to isolate, and use the matching indices to provide
you with the endpoints of your subsets.

Hope this helps,
Ray Brownrigg
----
allvar <- function(dat) {
  len <- length(dat)
  res <- numeric((len - 1)*len/2)
  mym <- dat[1]
  myv <- 0
  for (j in 2:len) {
    oldm <- mym
    mym <- (((j - 1):1)*mym + dat[j])/(j:2)
    myv <- (((j - 2):0)*myv + ((j - 1):1)*oldm^2 + dat[j]^2 - (j:2)*mym^2)/
      ((j - 1):1)
    res[((j - 2)*(j - 1)/2 + 1):((j - 1)*j/2)] <- myv
    mym <-c(mym, dat[j])
    myv <- c(myv, 0)
  }
return(res)
}
# Now test it
> n <- 500; dat <- runif(n)
> unix.time({res2 <- matrix(0, n, n); res2[upper.tri(res2)] <- allvar(dat)})
[1] 0.64 0.00 0.64 0.00 0.00
> unix.time({res1 <- matrix(0, n, n); for (j in 2:n) for (i in 1:(j - 1)) res1[i, j] <- var(dat[i:j])})
[1] 72.64 13.74 86.38  0.00  0.00
> max(abs(res1 - res2))
[1] 1.249001e-15
> 
----




More information about the R-help mailing list