R-alpha: speed of sort(.) and order(.)

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Wed, 3 Sep 1997 18:13:16 +0200


sort() and order() are not quite the same, as "one knows":

 o  order allows breaking ties by more than one argument;
 o  sort  allows a 'partial' and 'na.last' argument

Still, the following timing (on a `simple' UltraSparc I)
suggest that actually two different algorithms are used

> N <- 10000
> typeof(x0 <- 1:N) # --- x0 : already sorted ---
[1] "integer"
> typeof(x  <- sample(N))
[1] "integer"
> CPU <- numeric(10)

> for(j in 1:10) CPU[j] <- unix.time(for(i in 1:20) sort(x0))[1]
> summary(CPU)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.070   1.080   1.080   1.137   1.217   1.240 

> for(j in 1:10) CPU[j] <- unix.time(for(i in 1:20) order(x0))[1]
> summary(CPU)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.900   1.000   1.035   1.012   1.047   1.070 

'order' is slightly faster for an already sorted vector
	(this seems consistent for further examples, 
	  not statistically significant for these 10..)


> for(j in 1:10) CPU[j] <- unix.time(for(i in 1:20) sort(x))[1]
> summary(CPU)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.700   1.720   1.735   1.765   1.827   1.870 

> for(j in 1:10) CPU[j] <- unix.time(for(i in 1:20) order(x))[1]
> summary(CPU)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.000   3.100   3.140   3.181   3.227   3.540 
>

which makes 'sort' almost a factor of 2  FASTER for an unsorted 'x'.

Any comments from the coders  (R & R) ?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-