[Rd] RE: [R] Benchmarking R, why sort() is so slow?

Ross Ihaka ihaka@stat.auckland.ac.nz
Sat, 28 Apr 2001 10:19:58 +1200


You may be overlooking a major reason for the slowness of the
R sorting --- the fact that the element comparisons are not inlined.
Comparisons dominate the cost of the sort, so any speedup of them
will make sorting faster.

It might be worth looking at replacing the present comparison
function definitions with macros to see what sort of speedup
you get.  It might also be useful to try to take advantage of
IEEE arithmetic behaviour, or perhaps do a pre-sweep of the
data to first place the NAs and Infs in their correct positions
at the start or the end, and so simplify the comparison code.
This might prove a bigger win than a switch of algorithm.

If you do plan on switching to Quicksort, you may want to read up
on tuning the algorithm.  One of John Bentley's "Programming Pearls"
chapters is on tuning Quicksort.  He does some simple things which
produce a sort 4 times faster than the "system sort".  I think
there is also some work by Bentley and Rick Becker on problems in
the Unix qsort implementation.

The choice to use Shellsort for R was based on advice in Segedin's
"Algorithms" book.  He says:

  "If one is not willing to invest the effort to be sure
  that a Quicksort implementation is not flawed, Shellsort
  is a much safer choice and will perform adequately for
  significantly less implementation effort."

Being the lazy sod that I am, I took this advice to heart.

-- 
Ross Ihaka                         Email:  ihaka@stat.auckland.ac.nz
Department of Statistics           Phone:  (64-9) 373-7599 x 5054
University of Auckland             Fax:    (64-9) 373-7018
Private Bag 92019, Auckland
New Zealand
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._