[R] sorting without order

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Nov 23 13:58:30 CET 2004


On Tue, 23 Nov 2004, Peter Dalgaard wrote:

> "Marc Mamin" <M.Mamin at intershop.de> writes:
>
>> Hello,
>>
>>
>> In order to increase the performance of a script I'd like to sort very large vectors containing repeated integer values.
>> I'm not interesting in having the values sorted, but only grouped.
>> I also need the equivalent of index.return from the standard "sort" function:
>>
>>   f(c(10,1,10,100,1,10))
>>
>>   =>
>>
>>   grouped: c(10,10,10,1,1,100)
>>   ix:	  c(1,3,6,2,5,4)
>>
>>
>> is there a way to achieve this which would be faster than the standard sort function?
>>
>> Thanks for any hints,
>
> Here's one way:
>
> v <- c(10,1,10,100,1,10)
> ix <- do.call("c",split(seq(along=v),v))
> grouped <- v[ix]
>
> Not sure about the speed though. Should be O(N) if the number of
> groups is small, but the multiplier could be large because of various
> formalities (such as adding names to ix).

Radix sorting as implemented in sort.list will be hard to beat:

ix <- sort.list(unclass(factor(v)), method="radix")

although in your case as.integer() will do.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list