[R] What "method" does sort() use?

peter dalgaard pdalgd at gmail.com
Fri Mar 18 10:13:59 CET 2016


On 18 Mar 2016, at 10:02 , Patrick Connolly <p_connolly at slingshot.co.nz> wrote:

> I don't follow why this happens:
> 
>> sort(c(LETTERS[1:5], letters[1:5]))
> [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E"
> 
> The help for sort() says:
> 
>  method: character string specifying the algorithm used.  Not
>          available for partial sorting.  Can be abbreviated.
> 
> But what are the methods available?  The help mentions xtfrm but that
> doesn't illuminate, I'd have thought that at least by default it would
> have something to do with ASCII codes.  But that's not the case since
> all the uppercase ones would be before the lowercase ones.
> 
> I know something different is happening but I don't know what it is
> (do you, Mr Jones?).  Apologies to Bob Dylan.
> 


Um, read _all_ of the help file?

sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE,
         method = c("shell", "quick"), index.return = FALSE)

[snip]

Method "shell" uses Shellsort (an O(n^{4/3}) variant from Sedgewick (1986)). If x has names a stable modification is used, so ties are not reordered. (This only matters if names are present.)

Method "quick" uses Singleton (1969)'s implementation of Hoare's Quicksort method and is only available when x is numeric (double or integer) and partial is NULL. (For other types of x Shellsort is used, silently.) It is normally somewhat faster than Shellsort (perhaps 50% faster on vectors of length a million and twice as fast at a billion) but has poor performance in the rare worst case. (Peto's modification using a pseudo-random midpoint is used to make the worst case rarer.) This is not a stable sort, and ties may be reordered.

Factors with less than 100,000 levels are sorted by radix sorting when method is not supplied: see sort.list.

-pd


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list