[R] Alternate to for-loop

Stefan Evert stefan.evert at uos.de
Tue Feb 17 11:25:24 CET 2009


>>  I ran into a similar issue with a simple benchmark the other day,
>> where a plain loop in Lua was faster than vectorised code in R ...
> hmm, would you be saying that r's vectorised performance is overhyped?
> or is it just that non-vectorised code in r is slow?

What I meant, I guess, was (apart from a little bit of trolling) that  
I'd had misconceptions about the speed differences between loops and  
vectorised code.  In particular, I had entertained the naive belief  
that vectorised solutions are always highly efficient (I wonder if I'm  
the only one who was naive enough to think this ..), so I was very  
much surprised to find a loop in an interpreted language like Lua to  
be faster than vectorised R code.

My silly little benchmark translated the Lua code

	sum = 0
	for i=1,N do sum = sum + i end

into vectorised R


The performance results were as follows:

for loop in R:		0.75 Mops/s  (2000000 ops in 2.66 s)
vectorised R:		29.75 Mops/s  (50000000 ops in 1.68 s)
Lua:				51.54 Mops/s  (100000000 ops in 1.94 s)
Perl:				8.26 Mops/s  (10000000 ops in 1.21 s)

Note that Lua is an interpreted language (compiled to byte code); with  
the just in time compiler I get more than 230 Mops/s.

I suspect that this has to do with cache trashing, since the  
vectorised code in R operates on large vectors that have to be read  
from / written to RAM, while the Lua loop presumably runs entirely  
from the L1 cache.  (Before you ask, I split the vectorised R code  
into a loop that processes 1 million numbers at a time; I tried  
different ways of coding the benchmark and picked the fastest solution.)

>> Perhaps loops in R aren't always as slow (compared to matrix
>> operations) as one seemed to think.
> depends how and where you use them.  in the problem discussed here,  
> they
> do slow down the code for some class of inputs and do not speedup for
> the others, compared to the array version of pat.

My mistake was to think that vectorisation will always give a  
substantial performance boost and that for-loops should be avoided  
whenever possible. But it's really just the inner loops that need to  
be vectorised: iterating over the outer margins of a matrix doesn't  
add much overhead, especially if the vectorised solution would have to  
operate on huge matrices.

Guess that's a bad habit from my old Matlab days (back in the early  
90s) ...


More information about the R-help mailing list