[R] R badly lags matlab on performance?

Mon Jan 5 00:38:40 CET 2009

On Sun, Jan 4, 2009 at 4:50 PM,  <luke at stat.uiowa.edu> wrote:
> On Sun, 4 Jan 2009, Stavros Macrakis wrote:
>> On Sat, Jan 3, 2009 at 7:02 PM,  <luke at stat.uiowa.edu> wrote:
>>> R's interpreter is fairly slow due in large part to the allocation of
>>> argument lists and the cost of lookups of variables,

I'd think another problem is call-by-need.  I suppose inlining or
batch analyzing groups of functions helps there.

>>> including ones like [<- that are assembled and looked up as strings on every call.
>> Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `<-`(`[`(...), ...) ==> `<-[`(...,...).
> 'Awful' seems a bit strong.

Well, I haven't looked at the code, but if I'm interpreting "assembled
and looked up as strings on every call" correctly, this means taking
names, expanding them to strings, concatenating them, re-interning
them, then looking up the value.  That sounds pretty awful to me both
in the sense of being inefficient and of being ugly.

>> I'd think that one of the challenges will be the dynamic types --...
> I am for now trying to get away without declarations and pre-testing
> for the best cases before passing others off to the current internal
> code.

Have you considered using Java bytecodes and taking advantage of
dynamic compilers like Hotspot?  They often do a good job in cases
like this by assuming that types are fairly predictable from one run
to the next of a piece of code.  Or is the Java semantic model too
different?

> ...There is always a trade-off in complicating the code and the consequences for maintainability that implies.

Agreed entirely!

> A 1.5 factor difference here I find difficult to get excited about, but it might be worth a look.

I agree. The 1.5 isn't a big deal at all.

           -s