[R] Fast way of finding top-n values of a long vector

Allan Engelhardt allane at cybaea.com
Thu Jun 4 10:18:19 CEST 2009


If x is a (long) vector and n << length(x), what is a fast way of 
finding the top-n values of x?

Some suggestions (calculating the ratio of the two top values):


library("rbenchmark")
set.seed(1); x <- runif(1e6, max=1e7); x[1] <- NA;
benchmark(
 replications=20,
 columns=c("test","elapsed"),
 order="elapsed"
 , sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
 , max  = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; m/max(x[-w], 
na.rm=TRUE);}
 , max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
)
#   test elapsed
# 3 max2   0.772
# 2  max   1.732
# 1 sort   4.958


I want to apply this code to a few tens of thousands of vectors so speed 
does matter.  In C or similar I would of course calculate the result 
with a single pass through x, and not with three passes as in 'max2'.


Allan.

PS: I know na.last=NA is the default for sort, but there is no harm in 
being explicit in how you want NA's handled.




More information about the R-help mailing list