[R] R badly lags matlab on performance?

Stavros Macrakis macrakis at alum.mit.edu
Sun Jan 4 22:05:48 CET 2009


On Sat, Jan 3, 2009 at 7:02 PM,  <luke at stat.uiowa.edu> wrote:
> R's interpreter is fairly slow due in large part to the allocation of
> argument lists and the cost of lookups of variables, including ones
> like [<- that are assembled and looked up as strings on every call.

Wow, I had no idea the interpreter was so awful. Just some simple
tree-to-tree transformations would speed things up, I'd think, e.g.
`<-`(`[`(...), ...) ==> `<-[`(...,...).

> The current byte code compiler available from my web site speeds this
> (highly artificial) example by about a factor of 4.  The experimental
> byte code engine I am currently working on (and that can't yet do much
> more than an example like this) speeds this up by a factor of
> 80. Whether that level of improvement (for toy examples like this)
> will remain once the engine is more complete and whether a reasonable
> compiler can optimize down to the assembly code I used remain to be
> seen.

Not sure I follow here.  It sounds as though you have 4 levels of execution:

1) interpreter
2) current byte-code engine
3) future byte-code engine
4) compilation of byte codes into machine code

Is that right?  I'm not sure what the difference between 2 and 3 is,
and what the 80x figure refers to.

I'd think that one of the challenges will be the dynamic types --
where you don't know statically if an argument is a logical, an
integer, a real, or a string.  Will you be adding declarations,
assuming the best case and interpreting all others or ...?

Does Matlab have the same type problem?  Or does it make everything
into a double?  That still wouldn't explain the vectorized case, since
the type dispatch only has to happen once.

Sometimes some very simple changes in the implementation can make huge
differences in overall runtime.  I still remember a 10-word change I
made in Maclisp in 1975 or so where I special-cased the two-argument
case of (+ integer integer) => integer -- what it normally did was
convert it to the general n-argument arbitrary-type case.  This
speeded up (+ integer integer) by 10x (which doesn't matter much), but
also sped up the overall Macsyma symbolic algebra system by something
like 20%.

           -s

            -s




More information about the R-help mailing list