[R] Fast way of finding top-n values of a long vector

Allan Engelhardt allane at cybaea.com
Thu Jun 4 20:51:31 CEST 2009


Greg Snow wrote:
> Try adding a version that uses sort with the partial argument, that should be faster than regular sort (for long enough test vectors) and possibly faster than the max solutions when finding more than just the largest 2.

I find the documentation for the partial argument in sort very difficult 
to understand, but it seems to be something like this:


library("rbenchmark")
set.seed(1); x <- runif(1e6, max=1e7); x[1] <- NA;
benchmark(
 replications=20,
 columns=c("test","elapsed"),
 order="elapsed"
 , sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
 , qsrt = {a<-sort(x, decreasing=TRUE, na.last=NA, method="quick")[1:2]; 
a[1]/a[2];}
 , part = {a<-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];}
 , max1 = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; 
m/max(x[-w],na.rm=TRUE);}
 , max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
)
#   test elapsed
# 5 max2   0.846
# 4 max1   1.957
# 3 part   2.752
# 2 qsrt   4.561
# 1 sort   5.577


Completely agree on your point about partial sort being faster for 
larger values of n and certainly giving more scalable code which was 
also what I was looking for so thanks for that tip!

> Also, for your max solutions, what happens when the 2 largest values are identical?

It returns those two values, just like the sort solution:

x <- c(999,NA,1:10,NA,999)
m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; c(m, max(x[-w],na.rm=TRUE))
# [1] 999 999
sort(x, decreasing=TRUE, na.last=NA)[1:2]
# [1] 999 999


Allan.




More information about the R-help mailing list