[R] R badly lags matlab on performance?

Mon Jan 5 23:08:55 CET 2009

On Sun, 4 Jan 2009, Stavros Macrakis wrote:

> On Sun, Jan 4, 2009 at 4:50 PM,  <luke at stat.uiowa.edu> wrote:
>> On Sun, 4 Jan 2009, Stavros Macrakis wrote:
>>> On Sat, Jan 3, 2009 at 7:02 PM,  <luke at stat.uiowa.edu> wrote:
>>>> R's interpreter is fairly slow due in large part to the allocation of
>>>> argument lists and the cost of lookups of variables,
>
> I'd think another problem is call-by-need.  I suppose inlining or
> batch analyzing groups of functions helps there.

Yes.  The overhead can probably be reduced at least in compiled code,
but it will always be significant. Many primitives are strict and do
not depend on call stack position so inlinign is safe and that is done
in the current compiler.  Figuring out whether inlining is safe for
user functions is more problematic and may need declarations.

>
>>>> including ones like [<- that are assembled and looked up as strings on every call.
>>> Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `<-`(`[`(...), ...) ==> `<-[`(...,...).
>> 'Awful' seems a bit strong.
>
> Well, I haven't looked at the code, but if I'm interpreting "assembled
> and looked up as strings on every call" correctly, this means taking
> names, expanding them to strings, concatenating them, re-interning
> them, then looking up the value.

That's about it as I recall.

>  That sounds pretty awful to me both
> in the sense of being inefficient and of being ugly.

Ugly: a matter of taste and opinion. Inifficient: yes, but in the
context of the way the rest of the computation is done it is simple
and efficient enough (no point in optimizing this given the other
issues at this point).  It doesn't make the interpreter awful, which
is what you said.

>>> I'd think that one of the challenges will be the dynamic types --...
>> I am for now trying to get away without declarations and pre-testing
>> for the best cases before passing others off to the current internal
>> code.
>
> Have you considered using Java bytecodes and taking advantage of
> dynamic compilers like Hotspot?  They often do a good job in cases
> like this by assuming that types are fairly predictable from one run
> to the next of a piece of code.  Or is the Java semantic model too
> different?

My sense at this point is that this isn't a particularly good match,
in particular as one of my objectives is to try to take advantage of
opportunities for computing some compound numerical operations on
vectors in parallel.  But the possibility of translating the R byte
code to C, JVM, .Net, etc is something I'm trying to keep in mind.

luke

>> ...There is always a trade-off in complicating the code and the consequences for maintainability that implies.
>
> Agreed entirely!
>
>> A 1.5 factor difference here I find difficult to get excited about, but it might be worth a look.
>
> I agree. The 1.5 isn't a big deal at all.
>
>           -s
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu