[R] Alternate to for-loop
stefan.evert at uos.de
Tue Feb 17 11:25:24 CET 2009
>> I ran into a similar issue with a simple benchmark the other day,
>> where a plain loop in Lua was faster than vectorised code in R ...
> hmm, would you be saying that r's vectorised performance is overhyped?
> or is it just that non-vectorised code in r is slow?
What I meant, I guess, was (apart from a little bit of trolling) that
I'd had misconceptions about the speed differences between loops and
vectorised code. In particular, I had entertained the naive belief
that vectorised solutions are always highly efficient (I wonder if I'm
the only one who was naive enough to think this ..), so I was very
much surprised to find a loop in an interpreted language like Lua to
be faster than vectorised R code.
My silly little benchmark translated the Lua code
sum = 0
for i=1,N do sum = sum + i end
into vectorised R
The performance results were as follows:
for loop in R: 0.75 Mops/s (2000000 ops in 2.66 s)
vectorised R: 29.75 Mops/s (50000000 ops in 1.68 s)
Lua: 51.54 Mops/s (100000000 ops in 1.94 s)
Perl: 8.26 Mops/s (10000000 ops in 1.21 s)
Note that Lua is an interpreted language (compiled to byte code); with
the just in time compiler I get more than 230 Mops/s.
I suspect that this has to do with cache trashing, since the
vectorised code in R operates on large vectors that have to be read
from / written to RAM, while the Lua loop presumably runs entirely
from the L1 cache. (Before you ask, I split the vectorised R code
into a loop that processes 1 million numbers at a time; I tried
different ways of coding the benchmark and picked the fastest solution.)
>> Perhaps loops in R aren't always as slow (compared to matrix
>> operations) as one seemed to think.
> depends how and where you use them. in the problem discussed here,
> do slow down the code for some class of inputs and do not speedup for
> the others, compared to the array version of pat.
My mistake was to think that vectorisation will always give a
substantial performance boost and that for-loops should be avoided
whenever possible. But it's really just the inner loops that need to
be vectorised: iterating over the outer margins of a matrix doesn't
add much overhead, especially if the vectorised solution would have to
operate on huge matrices.
Guess that's a bad habit from my old Matlab days (back in the early
More information about the R-help