[R] Sorting of character vectors

Pascal A. Niklaus Pascal.Niklaus at ieu.uzh.ch
Tue Nov 8 13:18:39 CET 2016


I just got caught by the way in character vectors are sorted.

It seems that on my machine "sort" (and related functions like "order") 
only consider characters related to punctuation (at least here the "+" 
and "-") when there is no difference in the remaining characters:

 > x1 <- c("-A","+A")
 > x2 <- c("+A","-A")
 > sort(x1)    # sorting is according to "-" and "+"
[1] "-A" "+A"
 > sort(x2)
[1] "-A" "+A"

 > x3 <- c("-Aa","-Ab")
 > x4 <- c("-Aa","+Ab")
 > x5 <- c("+Aa","-Ab")
 > sort(x3)
[1] "-Aa" "-Ab" # here the "+" and "-" are ignored
 > sort(x4)
[1] "-Aa" "+Ab"
 > sort(x5)
[1] "+Aa" "-Ab"

I understand from the help that this depends on how characters are 
collated, and that this scheme follows the multi-level comparison in 
unicode (http://www.unicode.org/reports/tr10/).

However, what I need is a strict left-to-right comparison of the sort 
provided by strcmp or wcscmp in glibc. The particular ordering of 
special characters is not so important, but there should be no 
"multi-level" aspect to the sorting.

Is there a way to achieve this in R?

Thanks for your help

Pascal



More information about the R-help mailing list