[R] R badly lags matlab on performance?

Sun Jan 4 22:50:40 CET 2009

On Sun, 4 Jan 2009, Stavros Macrakis wrote:

> On Sat, Jan 3, 2009 at 7:02 PM,  <luke at stat.uiowa.edu> wrote:
>> R's interpreter is fairly slow due in large part to the allocation of
>> argument lists and the cost of lookups of variables, including ones
>> like [<- that are assembled and looked up as strings on every call.
>
> Wow, I had no idea the interpreter was so awful. Just some simple
> tree-to-tree transformations would speed things up, I'd think, e.g.
> `<-`(`[`(...), ...) ==> `<-[`(...,...).

'Awful' seems a bit strong.  It's also a bit more complicated in that
one needs both [ and [<- in complex assignment expressions, but the
point that one could rewrite assignments into something that can be
more efficiently executed is certainly true. There are also a number
of opportunities to do things like this.  They do have repercussions
though -- in this case one would either need to modify code that needs
to lok at the original code to undo the operatin, or add a new data
structure that contains the original code object and the rewritten
one, and deal with implications for serialization, and so on. Doable
of course, and worth doing if the payoff is high enough, but I'm not
convinced it is at this point.

>
>> The current byte code compiler available from my web site speeds this
>> (highly artificial) example by about a factor of 4.  The experimental
>> byte code engine I am currently working on (and that can't yet do much
>> more than an example like this) speeds this up by a factor of
>> 80. Whether that level of improvement (for toy examples like this)
>> will remain once the engine is more complete and whether a reasonable
>> compiler can optimize down to the assembly code I used remain to be
>> seen.
>
> Not sure I follow here.  It sounds as though you have 4 levels of execution:
>
> 1) interpreter
> 2) current byte-code engine
> 3) future byte-code engine
> 4) compilation of byte codes into machine code
>
> Is that right?  I'm not sure what the difference between 2 and 3 is,

3 is hopefully a much more efficient engine than 2.

I'm not looking at 4 for now but keeping an eye on the possibility, at
least via C code generation.

> and what the 80x figure refers to.

relative to the current interpreter -- I got 80 sec with the
interpreter and 1 sec with the new byte code engine.

> I'd think that one of the challenges will be the dynamic types --
> where you don't know statically if an argument is a logical, an
> integer, a real, or a string.  Will you be adding declarations,
> assuming the best case and interpreting all others or ...?

I am for now trying to get away without declarations and pre-testing
for the best cases before passing others off to the current internal
code.  By taking advantage of the mechanisms we use now to avoid
uneccessary copies it _looks_ like this allows me ot avoid boxing up
intermediate values in many cases and that seems to help a lot. Given
the overhead of the engine I'm not sure if specific type information
would help that much (quick experiments suggest it doesn't but that
needs more testing) -- it would of course pay off with machine code
generation.

> Does Matlab have the same type problem?  Or does it make everything
> into a double?  That still wouldn't explain the vectorized case, since
> the type dispatch only has to happen once.

I suspect the main reason for the difference in the vectorized case is
that our current code does not special-case the vector/scalar case. R
has more general recycling rules than Matlab, and the current code in
the interpreter is written for the general case only (I thought we had
special-cased scalar/scalar but unless I missed something in a quick
look it appears not).

> Sometimes some very simple changes in the implementation can make huge
> differences in overall runtime.  I still remember a 10-word change I
> made in Maclisp in 1975 or so where I special-cased the two-argument
> case of (+ integer integer) => integer -- what it normally did was
> convert it to the general n-argument arbitrary-type case.  This
> speeded up (+ integer integer) by 10x (which doesn't matter much), but
> also sped up the overall Macsyma symbolic algebra system by something
> like 20%.

We've had a few of those, and I suspect there are plenty more. There
is always a trade-off in complicating the code and the consequences
for maintainability that implies. A 1.5 factor difference here I find
difficult to get excited about, but it might be worth a look.

luke

>
>           -s
>
>            -s
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu