[R] Need very fast application of 'diff' - ideas?

Sat Jan 28 19:19:27 CET 2012

On 28 January 2012 at 11:46, Dirk Eddelbuettel wrote:
| 
| On 28 January 2012 at 16:20, Hans W Borchers wrote:
| | R. Michael Weylandt <michael.weylandt <at> gmail.com> writes:
| | > 
| | > I'd write your own diff() that eliminates the method dispatch and
| | > argument checking that diff -> diff.default does.
| | > 
| | > x[-1] - x[-len(x)] # is all you really need.
| | > (# you could also try something like c(x[-1], NA) - x which may be
| | > marginally faster as it only subsets x once but you should profile to
| | > find out)
| | > 
| | > is probably about as fast as you can get within pure R code (the
| | > function overhead will add a little bit of time as well, so if speed
| | > is truly the only thing that matters, best not to use it. If you wanna
| | > go for even more speed, you'll have to go to compiled code; I'd
| | > suggest inline+Rcpp as the easiest way to do so. That could get it
| | > down to a single pass through the vector in pure C (or nice C++) which
| | > seems to be a lower bound for speed.
| | > 
| | > Michael
| | 
| | 
| | Python has become astonishingly fast during the last years. On an iMAc with
| | 3.06 GHz I can see the following timings (though I do feel a bit suspicious 
| | about the timings Python reports):
| | 
| |     Python      0.040 s     Version 2.6.1, 1e7 integer elements
| |     Matlab      0.095 s     Matlab's diff function (Version R2011b)
| |     Matlab      0.315 s     Matlab using x(2:N)-x(1:(N-1))
| |     R 2.14.1    0.375 s     R's diff() function
| |     R           0.365 s     R using x[-1]-x[-N]
| |     R           0.270 s     R using c(x[-1],NA)-x)
| |     R+Fortran   0.180 s     R function calling .Fortran
| |     R+C         0.180 s     R function calling .C
| | 
| | where---as an example---the C code looks like:
| | 
| |     void diff(int *n, int *x, int *d)
| |     { for (long i=0; i<*n-2; i++) d[i] = x[i+1] - x[i]; }
| 
| We looked at a number of these operations in the context of the Rcpp
| benchmark for vector convolution.  If you really really care about the last
| bit of speed you do a little better using pointer arithmetic rather than [] indexing.
| 
| Also, R always checks for NA which is costly.

Ok, couldn't resist.  A _really simple_ Rcpp use case already does _much
better_ relative to R than your R+Fortran and R+C cases:

R> library(inline)
R> library(rbenchmark)
R> 
R> X <- seq(1, 1e5)
R> 
R> f1 <- function(x) { diff(x) }
R> 
R> f2 <- function(x) { x[-1]-x[-length(X)] }
R> stopifnot(all.equal(f1(X), f2(X)))
R> 
R> f3 <- function(x) { (x - c(NA, x[-length(X)]))[-1] }
R> stopifnot(all.equal(f1(X), f3(X)))
R> 
R> f4 <- cxxfunction(signature(xs="integer"), plugin="Rcpp", body='
+    Rcpp::IntegerVector x(xs);
+    Rcpp::IntegerVector y = diff(x);
+    return y;
+ ')
R> stopifnot(all.equal(f1(X), f4(X)))
R> 
R> 
R> res <- benchmark("useDiff"     = f1(X),
+                  "firstlast"   = f2(X),
+                  "R vector"    = f3(X),
+                  "Rcpp naive"  = f4(X),
+                  columns=c("test", "elapsed", "relative", "user.self", "sys.self"),
+                  order="relative",
+                  replications=100)
R> print(res)
        test elapsed relative user.self sys.self
4 Rcpp naive   0.013   1.0000     0.012        0
2  firstlast   0.323  24.8462     0.368        0
1    useDiff   0.329  25.3077     0.376        0
3   R vector   0.339  26.0769     0.384        0
R> 

That wasn't really trying hard as we didn't do the loops by hand do eliminate
the NA checks etc -- we simply used Rcpp sugar to get vectorised operations
at the C++ level.  I think there is lots of room to make this faster.  (And
yes when I try your Python example on my box, it comes up faster still.)

But this is just to show that 

    a) the "compiled" code using Rcpp can be much easier to write than the 
       naive (ie Kernighan and Ritchie-inspired) C you put up there

    b) it may beat naive Fortran too as you had a factor of two improvement

    c) a 'factor of 25' improvement for three lines of code is pretty good
       use of programmer's time

Not that f3() needed a fix as the comment "R using c(x[-1],NA)-x)" above does
not lead to the same results as diff(x).

Dirk

| | There appears to be a factor of 4 between R+compiled code and Python code.
| | It is also interesting to see that in Matlab 'diff' is considerably faster
| | than differencing vectors, while in R it is slower.
| | 
| | P. S.:  To make the comparison fair I have used the following Python call:
| | 
| |     python -m timeit -n 1 -r 1
| |         -s 'import numpy' 
| |         -s 'arr = numpy.random.randint(0, 1000, (10000000,1)).astype("int32")'
| |         'diff = arr[1:] - arr[:-1]'
| | 
| | i.e., used 32-bit integers and included the indexing in the loop.
| 
| We should be able to close the gap from 40ms for Python to 180ms for R + C. I
| suspect there is some room left.  Can you post your codes ?
| 
| Dirk
|  
|  
| | > On Fri, Jan 27, 2012 at 7:15 PM, Kevin Ummel <kevinummel <at> gmail.com> wrote:
| | > > Hi everyone,
| | > >
| | > > Speed is the key here.
| | > >
| | > > I need to find the difference between a vector and its one-period lag
| | > > (i.e. the difference between each value and the subsequent one in the 
| | > > vector). Let's say the vector contains 10 million random integers
| | > > between 0 and 1,000. The solution vector will have 9,999,999 values,
| | > > since their is no lag for the 1st observation.
| | > >
| | > > In R we have:
| | > >
| | > > #Set up input vector
| | > > x = runif(n=10e6, min=0, max=1000)
| | > > x = round(x)
| | > >
| | > > #Find one-period difference
| | > > y = diff(x)
| | > >
| | > > Question is: How can I get the 'diff(x)' part as fast as absolutely
| | > > possible? I queried some colleagues who work with other languages, and
| | > > they provided equivalent solutions in Python and Clojure that, on their
| | > > machines, appear to be potentially much faster
| | > > (I've put the code below in case anyone is interested).
| | > > However, they mentioned that the overhead in passing the data between 
| | > > languages could kill any improvements. I don't have much experience 
| | > > integrating other languages, so I'm hoping the community has some ideas
| | > > about how to approach this particular problem...
| | > >
| | > > Many thanks,
| | > > Kevin
| | > >
| | > > In iPython:
| | > >
| | > > In [3]: import numpy as np
| | > > In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16")
| | > > In [5]: arr1 = arr[1:].view()
| | > > In [6]: timeit arr2 = arr1 - arr[:-1]
| | > > 10 loops, best of 3: 20.1 ms per loop
| | > >
| | > > In Clojure:
| | > >
| | > > (defn subtract-lag
| | > >  [n]
| | > >  (let [v (take n (repeatedly rand))]
| | > >    (time (dorun (map - v (cons 0 v))))))
| | >
| | 
| | ______________________________________________
| | R-help at r-project.org mailing list
| | https://stat.ethz.ch/mailman/listinfo/r-help
| | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| | and provide commented, minimal, self-contained, reproducible code.
| 
| -- 
| "Outside of a dog, a book is a man's best friend. Inside of a dog, it is too
| dark to read." -- Groucho Marx
| 
| ______________________________________________
| R-help at r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-help
| PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| and provide commented, minimal, self-contained, reproducible code.

-- 
"Outside of a dog, a book is a man's best friend. Inside of a dog, it is too
dark to read." -- Groucho Marx