[R] R badly lags matlab on performance?

luke at stat.uiowa.edu luke at stat.uiowa.edu
Sun Jan 4 01:02:33 CET 2009


On Sat, 3 Jan 2009, Duncan Murdoch wrote:

> On 03/01/2009 1:37 PM, Ajay Shah wrote:
>>> As for jit and Ra, that was immediate reaction too but I found that jit 
>>> does
>>> not help on your example.  But I concur fully with what Ben said --- use 
>>> the
>>> tool that is appropriate for the task at hand.  If your task is running 
>>> for
>>> loops, Matlab does it faster and you have Matlab, well then you should by 
>>> all
>>> means use Matlab.
>> 
>> A good chunk of statistical computation involves loops. We are all
>> happy R users. I was surprised to see that we are so far from matlab
>> in the crucial dimension of performance.
>> 
>
> I don't know Matlab, but I think the thing that is slowing R down here is its 
> generality.  When you write
>
> a[i] <- a[i] + 1
>
> in R, it could potentially change the meaning of a, [, <-, and + on each step 
> through the loop, so R looks them up again each time.  I would guess that's 
> not possible in Matlab, or perhaps Matlab has an optimizer that can recognize 
> that in the context where the loop is being evaluated, those changes are 
> known not to happen.

R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables, including ones
like [<- that are assembled and looked up as strings on every call.

> It *would* be possible to write such an optimizer for 
> R, and Luke Tierney's byte code compiler-in-progress might incorporate such a 
> thing.

The current byte code compiler available from my web site speeds this
(highly artificial) example by about a factor of 4.  The experimental
byte code engine I am currently working on (and that can't yet do much
more than an example like this) speeds this up by a factor of
80. Whether that level of improvement (for toy examples like this)
will remain once the engine is more complete and whether a reasonable
compiler can optimize down to the assembly code I used remain to be
seen.

> For the difference in timing on the vectorized versions, I'd guess that 
> Matlab uses a better compiler than gcc.  It's also likely that R incorporates 
> some unnecessary testing even in a case like this, because it's easier to 
> maintain code that is obviously sane than it is to maintain code that may not 
> be.  R has a budget which is likely several orders of magnitude smaller than 
> Mathworks has, so it makes sense to target our resources at more important 
> issues than making fast things run a bit faster.

Another possibility is optimization setting tht may be higher and/or
more processor specific than those used by R.

We do handle the case where both arguments to + are scalar (i.e. of
length 1) separately but I don't recall if we do so for the
vector/scalar case also -- I suspect not as that would make the code
less maintainable for not a very substantial gain.

luke

> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu




More information about the R-help mailing list