[R] sort() depends on locale (and platform and build)

Marius Hofert marius.hofert at math.ethz.ch
Sun Jun 15 18:34:28 CEST 2014


Hi,

Thanks for you help. I use R-devel under Ubuntu 14.04, here is the output of
sessionInfo():

> sessionInfo()
R Under development (unstable) (2014-06-02 r65832)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.2.0 tools_3.2.0


I assume ICU was not found/installed when R was installed as executing the first
couple of lines of the examples section of ?icuSetCollate leads to:

Warning message:
In icuSetCollate(case_first = "upper") : ICU is not supported on this build
[1] "aarhus" "Aarhus" "safe"   "test"   "Zoo"


Since only the (default) locale "C" gives the order I expected, I consider
changing my ~/.Rprofile. But it certainly had a reason why I changed it to
"en_US.UTF-8" at some point... hope that does not break anything else. Is there
any "recommendation" what to use in ~/.Rprofile (the default?)? And is the
'recommended approach' to have ICU installed and change the sorting order via
icuSetCollate if necessary?

I would have not expected any influence of the locale on the sorting order,
that's quite good to know. In fact, the example came up after I tried to sort
students' grades in a class with several students having the same last name
(which I made unique by adding the first names with a '.' separator)... quite a
'delicate' issue...

Cheers,

Marius



More information about the R-help mailing list