[Rd] Fourteen patches to speed up R

Fri Sep 24 23:06:13 CEST 2010

Luke - Thanks for your comments on the speed patches I submitted.  
I'm glad you like patch-transpose, patch-for, patch-evalList, and
patch-vec-arith.  I'll be interested to hear what you or other people
think about patch-dollar and patch-vec-subset after you've had more
time to consider them.  (Recall I later posted a split of
patch-vecsubset into patch-vec-subset and a new patch-subscript,
fixing in patch-subscript a bug in my original combined patch.)

Regarding patch-square and patch-sum-prod, you make different changes
that address the largest inefficiencies, but I'd like to do some runs
comparing your version with mine to see how big the remaining
differences are.  This is complicated by the fact that your changes to
sum and prod seem to encounter a bug in the C compiler I'm using (gcc
4.2.4 for i486-linux-gnu) at optimization level -O2 (the default for
the R configuration), the effect of which is for sum to deliver the
right answer, but just as slowly as before.  This doesn't happen with
-O3.  I'll investigate this further and report the conclusion.

Similarly, I'll do some more timing tests regarding patch-protect,
patch-fast-base, patch-fast-spec, and patch-save-alloc, and then
comment further on the gains that they produce.

Regarding patch-matprod, the issue of what BLAS routines do with NaN
and NA seems like it's one that needs to be resolved, preferably in a
way that doesn't slow down vector dot products by a factor of six.
However, I don't know what actual problem reports motivated the
current costly check for NAs.  This all interacts with the extreme
slowness on some machines of arithmetic on LDOUBLEs, which also seems
like it needs some resolution.  It's not clear to me what the
expectations regarding accuracy of functions like sum should be.  
One could certainly argue that users would expect the same accuracy 
as adding them up with "+", and no huge slowdown from trying to get
better accuracy.  But maybe there's some history here, or packages
that depend on increased accuracy (though of course there's no
guarantee that a C "long double" will actually be bigger than a
"double".)

Regarding patch-parens, I don't understand your reluctance to
incorporate this two-line code change.  According to my timing tests
(medians of five runs), it speeds up the test-parens.r script by 4.5%.
(Recall this is "for (i in 1:n) d <- (a+1)/(a*(b+c))".)  This is not 
a huge amount, but of course the speedup for the actual parenthesis
operator is greater, since there is other overhead in this test.  It
would be even better to make all BUILTIN operators faster (which my
patch-evalList does), but there are limits to how much is possible.

The fact that "(" is conceptually a BUILTIN rather than a SPECIAL
doesn't seem relevant to me.  "{" is also conceptually a BUILTIN.
Both are "special" from the point of view of the users, few of whom
will even imagine that one could call "(" and "{" as ordinary
functions.  Even if they can imagine this, it makes no difference,
since the patch has no user-visible effects.  If one is worried that
someone reading the code in src/main/eval.c might become confused
about the semantics of parentheses, a two-line comment saying "Parens
are conceptually a BUILTIN but are implemented as a SPECIAL to bypass
the overhead of creating an evaluated argument list" would avoid any
problem.

As an analogy, consider a C compiler generating code for "2*x".  One
could argue that multiplication is conceptually repeated addition, so
converting this to "x+x" is OK, but converting it to "x<<1" would be
wrong.  But I think no one would argue this.  Rather, everyone would
expect the compiler to choose between "x+x" and "x<<1" purely on the
basis of which one is faster.

    Radford