[R] Fast way of finding top-n values of a long vector

Greg Snow Greg.Snow at imail.org
Thu Jun 4 19:14:20 CEST 2009


Try adding a version that uses sort with the partial argument, that should be faster than regular sort (for long enough test vectors) and possibly faster than the max solutions when finding more than just the largest 2.

Also, for your max solutions, what happens when the 2 largest values are identical?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Allan Engelhardt
> Sent: Thursday, June 04, 2009 2:18 AM
> To: r-help at r-project.org
> Subject: [R] Fast way of finding top-n values of a long vector
> 
> If x is a (long) vector and n << length(x), what is a fast way of
> finding the top-n values of x?
> 
> Some suggestions (calculating the ratio of the two top values):
> 
> 
> library("rbenchmark")
> set.seed(1); x <- runif(1e6, max=1e7); x[1] <- NA;
> benchmark(
>  replications=20,
>  columns=c("test","elapsed"),
>  order="elapsed"
>  , sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
>  , max  = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; m/max(x[-w],
> na.rm=TRUE);}
>  , max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
> )
> #   test elapsed
> # 3 max2   0.772
> # 2  max   1.732
> # 1 sort   4.958
> 
> 
> I want to apply this code to a few tens of thousands of vectors so
> speed
> does matter.  In C or similar I would of course calculate the result
> with a single pass through x, and not with three passes as in 'max2'.
> 
> 
> Allan.
> 
> PS: I know na.last=NA is the default for sort, but there is no harm in
> being explicit in how you want NA's handled.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list