[R] Benchmarking R, why sort() is so slow?

Philippe Grosjean phgrosje at ulb.ac.be
Fri Apr 27 15:29:04 CEST 2001

I suspect it should be a question of algorithm choice. May be sort() is
important enough, including for large datasets, to place some improvement in
the "to do" list for a further version...?

-----Message d'origine-----
De : pd at pubhealth.ku.dk [mailto:pd at pubhealth.ku.dk]De la part de Peter
Dalgaard BSA
Envoye : vendredi 27 avril 2001 15:30
A : Philippe Grosjean
Cc : r-help at stat.math.ethz.ch
Objet : Re: [R] Benchmarking R, why sort() is so slow?

"Philippe Grosjean" <phgrosje at ulb.ac.be> writes:

> Hello everybody,
> I am making a modified version of "Stephan Steinhaus' benchmark test for
> number crunching, v. 2, (see
> http://www.scinetificweb.com/ncrunch/ncrunch.pdf for the original
> comparing several functions of some math/stat software. R is not
> bad at all... except for the sorting of a 1,100,000 random vector (test
> which is the worst of all (see cell F3 in the following table). I simply
> used the sort() function. Does anybody has an explanation for that?

The internal algorithm is a shellsort, which is supposedly of
complexity O(n^1.25) and has decent worst-case behaviour. Other
algorithms like quicksort have typical performance of O(n log n) but
extreme cases of O(n^2).

For large vectors the O(n^.25/log(n)) relative complexity is going to
make a difference, although the observed differences seem larger.

There are very likely better choices of sort algorithm...

   O__  ---- Peter Dalgaard             Blegdamsvej 3
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list