[R] sort()ing strings

Bob O'Hara rni.boh at gmail.com
Thu Oct 13 16:26:00 CEST 2016


Thanks - strangely capabilities("ICU") is FALSE (I'm using ubuntu
16.04, and icu-devtools is installed). So I guess I'll conclude that
there's something odd, but I don't want to delve into these issues (a
new locale & new computer for me in a couple of months).

Bob

On 13 October 2016 at 13:00, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>>>>>> Bob O'Hara <rni.boh at gmail.com>
>>>>>>     on Thu, 13 Oct 2016 11:55:04 +0200 writes:
>
>     > Yes, thanks. That seems to be it:
>     > thing <- c("M1", "M2", "M.1", "M.2")
>     >> sort(thing)
>     > [1] "M1" "M.1" "M2" "M.2"
>
> which I do find strange, indeed, given your sessionInfo which
> contains
>         LC_COLLATE=en_US.UTF-8
>
>     > The only documentation I can find is from ?Comparison:
>     > "Collation of non-letters (spaces, punctuation signs,
>     > hyphens, fractions and so on) is even more problematic."
>
> Well.  That help page contains more information further down,
> notably about ICU.  If
>
>      capabilities("ICU")
>
> gives TRUE for you (I assume it will as you use a modern ubuntu version),
> you can tweak the behavior to be more "reasonable" than above
>
> via R functions icu(Set|Get)Collate()
>
> BTW, I say "strange" above, because for me - also on modern
> Linux (Fedora 24), I "always" see
>
>> sort( c("M1", "M2", "M.1", "M.2") )
> [1] "M.1" "M.2" "M1"  "M2"
>
>
>> Sys.setlocale("LC_COLLATE", "de_CH.UTF-8")
> [1] "de_CH.UTF-8"
>> sort( c("M1", "M2", "M.1", "M.2") )
> [1] "M.1" "M.2" "M1"  "M2"
>> Sys.setlocale("LC_COLLATE", "en_US.UTF-8")
> [1] "en_US.UTF-8"
>> sort( c("M1", "M2", "M.1", "M.2") )
> [1] "M.1" "M.2" "M1"  "M2"
>
>> Sys.setlocale("LC_COLLATE", "C") # <--> ASCII ("the ole' time default)
> [1] "C"
>> sort( c("M1", "M2", "M.1", "M.2") )
> [1] "M.1" "M.2" "M1"  "M2"
>>
>
> I do use a newer R version (3.3.1 patched)
> but would not have expected that to matter here.
>
> Martin
>
>
>     > Indeed.
>
>     > Bob
>
>     > On 13 October 2016 at 11:26, PIKAL Petr
>     > <petr.pikal at precheza.cz> wrote:
>     >> Hi
>     >>
>     >> Just a wild guess. Dot is ignored and the output is
>     >> alphabetically sorted.
>     >>
>     >> You could try sort it yourself by
>     >>
>     >> sort(ls())
>     >>
>     >> Cheers Petr
>     >>
>     >>> -----Original Message----- From: R-help
>     >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Bob
>     >>> O'Hara Sent: Thursday, October 13, 2016 10:29 AM To:
>     >>> r-help <r-help at r-project.org> Subject: [R] (no subject)
>     >>>
>     >>> I've just come across an odd problem with sorting in
>     >>> ls(): it doesn't seem to order the object names
>     >>> correctly. If I do the following, the order isn't what I
>     >>> expect:
>     >>>
>     >>> > ls(sorted=TRUE) [1] "AridData" "AridDataToBUGS"
>     >>> "Arid.df" "Arid.hpd" "AridPrecip.sd" "Break.df" [7]
>     >>> "Break.hpd" "Cols" "Data" "DataFrames" "DataToBUGS"
>     >>> "DataToBUGS.nonlog" [13] "FitBRugs" "Fixed.df"
>     >>> "Fixed.hpd" "FormatData" "GetCol" "GetHPD" [19]
>     >>> "GetMCMC" "GetRow" "HPDIs" "Int.alpha12" "Int.alpha21"
>     >>> "ModisData" [25] "ModisDataToBUGS" "Modis.df"
>     >>> "ModisFixed.df" "ModisFixed.hpd" "Modis.hpd"
>     >>> "ModisPrecip.sd" [31] "ModisShrink.df" "ModisShrink.hpd"
>     >>> "ModisYears" "OrigData" "OrigDataToBUGS" "Orig.df" [37]
>     >>> "Orig.hpd" "OrigPrecip.sd" "OrigYears" "PlotChecks"
>     >>> "PlotEff" "plothpd" [43] "ProvinceNames" "ResNames"
>     >>> "ResNamesOrder" "Shrink.df" "Shrink.hpd" "SimInits"
>     >>>
>     >>> Specifically, the Modis* objects are sorted like this:
>     >>>
>     >>> > ls(sorted=TRUE)[26:30] [1] "Modis.df" "ModisFixed.df"
>     >>> "ModisFixed.hpd" "Modis.hpd" "ModisPrecip.sd"
>     >>>
>     >>> With Modis.* coming both before and after ModisF*. I
>     >>> can't see why there would be any odd problems with
>     >>> character sets changing (this was all done on a single
>     >>> computer with no weird locale switching), and the
>     >>> objects are all created within a single R session:
>     >>>
>     >>> > sessionInfo() R version 3.2.5 (2016-04-14) Platform:
>     >>> x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu
>     >>> 16.04.1 LTS
>     >>>
>     >>> locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>     >>> LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 [5]
>     >>> LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8
>     >>> LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C
>     >>> LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8
>     >>> LC_IDENTIFICATION=C
>     >>>
>     >>> attached base packages: [1] stats graphics grDevices
>     >>> utils datasets methods base
>     >>>
>     >>> other attached packages: [1] MCMCglmm_2.22.1 ape_3.5
>     >>> Matrix_1.2-7.1 RColorBrewer_1.1-2 plyr_1.8.4 coda_0.18-1
>     >>>
>     >>> loaded via a namespace (and not attached): [1]
>     >>> cubature_1.1-2 corpcor_1.6.8 tools_3.2.5 Rcpp_0.12.7
>     >>> nlme_3.1-128 grid_3.2.5 knitr_1.14 [8] tensorA_0.36
>     >>> lattice_0.20-34
>     >>>
>     >>> Can anyone explain what's going on?
>     >>>
>     >>> Bob
>     >>> --
>     >>> Bob O'Hara
>     >>>
>     >>> Biodiversity and Climate Research Centre
>     >>> Senckenberganlage 25 D-60325 Frankfurt am Main, Germany
>     >>>
>     >>> Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW:
>     >>> http://www.bik-f.de/root/index.php?page_id=219 Blog:
>     >>> http://occamstypewriter.org/boboh/ Journal of Negative
>     >>> Results - EEB: www.jnr-eeb.org



-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://occamstypewriter.org/boboh/
Journal of Negative Results - EEB: www.jnr-eeb.org



More information about the R-help mailing list