[R] difference in sort order linux/Windows (R.2.11.0)

(Ted Harding) Ted.Harding at manchester.ac.uk
Sat May 29 00:20:23 CEST 2010


Linux problem solved! (For me at any rate). Thanks to some hints
from my Linux contacts it transpires that the problem with

sort << EOT
"ABCD"
"A CD"
EOT
# "ABCD"
# "A CD"

sort << EOT
"ADCD"
"A CD"
EOT
# "A CD"
# "ADCD"

arises because, by default, the " " is ignored in sorting. Therefore
in the first case it sorted "ABCD" and "ACD", returning "ABCD", "A CD",
while in the second case it sorted "ADCD" and "ACD", and returned
"A CD", "ADCD".

A solution is to export LC_COLLATE=C -- following directly on after
the above:

export LC_COLLATE=C

sort << EOT
"ABCD"
"A CD"
EOT
# "A CD"
# "ABCD"

sort << EOT
"ADCD"
"A CD"
EOT
# "A CD"
# "ADCD"

Because "export" makes LC_COLLATE available to processes spawned
by the shell, this also works within R provided that
  export LC_COLLATE=C
is set prior to starting R:

$ export LC_COLLATE=C
$ R --no-save
[R Banner stuff]

> sort(c("ABCD","A CD"))
[1] "A CD" "ABCD"
> sort(c("ADCD","A CD"))
[1] "A CD" "ADCD"

So that is a work-round for Linux. As before, I can't comment
on Windows.

Thanks to all who contributed.
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-10                                       Time: 23:20:18
------------------------------ XFMail ------------------------------



More information about the R-help mailing list