[R] Survey of "moving window" statistical functions - still looking f or fast mad function

Gabor Grothendieck ggrothendieck at gmail.com
Sat Apr 2 03:40:53 CEST 2005


Jaroslaw's article was great.  In fact, it was used as the basis for 
rapply and some optimized special cases that will be included in
the R 2.1.0 version of zoo (which have been coded but not yet
released).

Regarding numerically stable summation, check out the idea 
behind the following which I coincidentally am also considering 
for the zoo implementation:

   http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/393090

On Apr 1, 2005 8:07 PM, Vadim Ogranovich <vograno at evafunds.com> wrote:
> Hi,
> 
> First, let me thank Jaroslaw for making this survey. I find it quite
> illuminating.
> 
> Now the questions:
> 
> * the #1 solution below (based on cumsum) is numerically unstable.
> Specifically if you do the runmean on a positive vector you can easily
> get negative numbers due to rounding errors. Does anyone see a
> modification which is free of this deficiency?
> * is it possible to optimize the algorithm of the filter function,
> solution #2 below, for the case of the  rep(1/k,k) kernel?
> 
> Thanks,
> Vadim
> 
> [R] Survey of "moving window" statistical functions - still looking f or
> fast mad function
> 
> *       This message: [ Message body
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start>  ] [ More
> options
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#options2>  ]
> *       Related messages: [ Next message
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html>  ] [ Previous
> message <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html>  ] [
> Next in thread <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html>
> ] [ Replies
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#replies>  ]
> 
> From: Tuszynski, Jaroslaw W. <JAROSLAW.W.TUSZYNSKI_at_saic.com
> <mailto:JAROSLAW.W.TUSZYNSKI_at_saic.com?Subject=Re:%20%5BR%5D%20Survey%
> 20of%20&quot;moving%20window&quot;%20statistical%20functions%20-%20still
> %20lookingf%20or%20fast%20mad%20function> >
> Date: Sat 09 Oct 2004 - 06:30:32 EST
> 
> Hi,
> 
> Lately I run into a problem that my code R code is spending hours
> performing simple moving window statistical operations. As a result I
> did searched archives for alternative (faster) ways of performing: mean,
> max, median and mad operation over moving window (size 81) on a vector
> with about 30K points. And performed some timing for several ways that
> were suggested, and few ways I come up with. The purpose of this email
> is to share some of my findings and ask for more suggestions (especially
> about moving mad function).
> 
> Sum over moving window can be done using many different ways. Here are
> some sorted from the fastest to the slowest:
> 
> 1.      runmean = function(x, k) { n = length(x) y = x[ k:n ] - x[
> c(1,1:(n-k)) ] # this is a difference from the previous cell y[1] =
> sum(x[1:k]); # find the first sum y = cumsum(y) # apply precomputed
> differences return(y/k) # return mean not sum }
> 2.      filter(x, rep(1/k,k), sides=2, circular=T) - (stats package)
> 3.      kernapply(x, kernel("daniell", m), circular=T)
> 4.      apply(embed(x,k), 1, mean)
> 5.      mywinfun <- function(x, k, FUN=mean, ...) { # suggested in news
> group n <- length(x) A <- rep(x, length=k*(n+1)) dim(A) <- c(n+1, k)
> sapply(split(A, row(A)), FUN, ...)[1:(n-k+1)] }
> 6.      rollFun(x, k, FUN=mean) - (fSeries package)
> 7.      rollMean(x, k) - (fSeries package)
> 8.      SimpleMeanLoop = function(x, k) { n = length(x) # simple-minded
> loop used as a baseline y = rep(0, n) k = k%/%2; for (i in (1+k):(n-k))
> y[i] = mean(x[(i-k):(i+k)]) }
> 9.      running(x, fun=mean, width=k) - (gtools package)
> 
> Some of above functions return results that are the same length as x and
> some return arrays with length n-k+1. The relative speeds (on Windows
> machine) were as follow: 0.01, 0.09, 1.2, 8.1, 11.2, 13.4, 27.3, 63,
> 345. As one can see there are about 5 orders of magnitude between the
> fastest and the slowest.
> 
> Maximum over moving window can be done as follow, in order of speed
> 
> 1.      runmax = function(x, k) { n = length(x) y = rep(0, n) m = k%/%2;
> a = 0; for (i in (1+m):(n-m)) { if (a==y[i-1]) y[i] =
> max(x[(i-m):(i+m)]) # calculate max of the window else y[i] =
> max(y[i-1], x[i+m]); # max of the window is =y[i-1] a = x[i-m] # point
> that will be removed from the window } return(y) }
> 2.      apply(embed(x,k), 1, max)
> 3.      SimpleMaxLoop(x, k) - similar to SimpleMeanLoop above
> 4.      mywinfun(x, k, FUN=max) - see above
> 5.      rollFun(x, k, FUN=max) - fSeries package
> 6.      rollMax(x, k) - fSeries package
> 7.      running(x, fun=max, width=k) - gtools package The relative
> speeds were: <0.01, 3, 3.4, 5.3, 7.5, 7.7, 15.3
> 
> Median over moving window can be done as follows:
> 
> 1.      runmed(x, k) - from stats package
> 2.      SimpleMedLoop(x, k) - similar to SimpleMeanLoop above
> 3.      apply(embed(x,k), 1, median)
> 4.      mywinfun(x, k, FUN=median) - see above
> 5.      rollFun (x, k, FUN=median) - fSeries package
> 6.      running(x, fun=max, width=k) - gtools package Speeds: <0.01,
> 3.4, 9, 15, 29, 165
> 
> Mad over moving window can be done as follows:
> 
> 1.      runmad = function(x, k) { n = length(x) A = embed(x,k) A = abs(A
> - rep(apply(A, 1, median), k)) dim(A) = c(n-k+1, k) apply(A, 1, median)
> }
> 2.      apply(embed(x,k), 1, mad)
> 3.      mywinfun(x, k, FUN=mad) - see above
> 4.      SimpleMadLoop(x, k) - similar to SimpleMeanLoop above
> 5.      rollFun(x, k, FUN=mad) - fSeries package
> 6.      running(x, fun=mad, width=k) - gtools package Speeds: 11, 18,
> 25, 50, 50, 400
> 
> Some thoughts about those results:
> 
> *       All functions from Stats package (runmed, filter, kernapply) are
> fast and hard to improve on.
> *       In case of Mean and Max a simple un-optimized R codes are much
> faster than specialized functions build for the same purpose.
> *       apply(embed(x,k), 1, fun) - seem to be always faster than any
> functions from fSeries package or "mywinfun"
> *       "running" function from gtools package is horribly slow compared
> to anything else
> *       "mywinfun" proposed as a faster version of "apply(embed(x,k), 1,
> fun)" seems to be always slower
> 
>        Finally a question: I still need to get moving windows mad
> function faster my "runmad" function is not that much faster than
> apply/embed combo, and that I used before, and this is where my code
> spends most of its time. I need something like "runmed" but for a mad
> function. Any suggestions?
> 
> Jarek
> 
> =====================================\====
> Jarek Tuszynski, PhD.                               o / \
> Science Applications International Corporation  <\__,|
> (703) 676-4192                        ">  \
> Jaroslaw.W.Tuszynski at saic.com                   `    \
> 
>        [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
> guide! http://www.R-project.org/posting-guide.html Received on Sat Oct
> 09 06:43:05 2004
> 
> *       This message: [ Message body
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start>  ]
> *       Next message: Emili Tortosa-Ausina: "Re: [R] RWinEdt"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html>
> *       Previous message: Brian S Cade: "Re: [R] reading Systat into R"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html>
> *       Next in thread: Prof Brian Ripley: "Re: [R] Survey of "moving
> window" statistical functions - still looking f or fast mad function"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html>
> *       Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window"
> statistical functions - still looking f or fast mad function"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html>
> *       Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window"
> statistical functions - still looking f or fast mad function"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html>
> 
> *       Contemporary messages sorted: [ By Date
> <http://tolstoy.newcastle.edu.au/R/help/04/10/date.html#5161>  ] [ By
> Thread <http://tolstoy.newcastle.edu.au/R/help/04/10/index.html#5161>  ]
> [ By Subject
> <http://tolstoy.newcastle.edu.au/R/help/04/10/subject.html#5161>  ] [ By
> Author <http://tolstoy.newcastle.edu.au/R/help/04/10/author.html#5161>
> ] [ By messages with attachments
> <http://tolstoy.newcastle.edu.au/R/help/04/10/attachment.html>  ]
> 
> This archive was generated by hypermail 2.1.8
> <http://www.hypermail.org/>  : Fri 18 Mar 2005 - 03:03:35 EST
> 
>        [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list