[R] Fast way of finding top-n values of a long vector

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Thu Jun 4 10:36:02 CEST 2009


On Thu, Jun 4, 2009 at 9:18 AM, Allan Engelhardt <allane at cybaea.com> wrote:

> I want to apply this code to a few tens of thousands of vectors so speed
> does matter.  In C or similar I would of course calculate the result with a
> single pass through x, and not with three passes as in 'max2'.
>

 So why not code it in C? See the R guide on writing extensions for
how to link C code with R it's pretty easy especially if you know the
size of the objects being passed around, then there's no messy memory
management to deal with.

Wikipedia has some algorithms, with an inevitable dose of Knuth:

http://en.wikipedia.org/wiki/Selection_algorithm

and apparently something like that is in the standard C++ library:

"C++ also provides the partial_sort algorithm, which solves the
problem of selecting the smallest k elements (sorted), with a time
complexity of O(nlog k). No algorithm is provided for selecting the
greatest k elements, but this can easily be achieved by inverting the
ordering predicate."


Barry




More information about the R-help mailing list