[R] difference in sort order linux/Windows (R.2.11.0)

(Ted Harding) Ted.Harding at manchester.ac.uk
Fri May 28 13:49:22 CEST 2010


It would seem that there is indeed a locale effect. Revisiting the
examples I used on Linux in a previous post, at which time I was
using the default "LC_COLLATE=en_GB.UTF-8", I changed this to "C".
Both the "C" and the "en_GB.UTF-8" are indicated (the latter copied
from my previous post):

  Sys.setlocale("LC_COLLATE", "C")
  # [1] "C"
  sort(c("AB CD","ABCD"))
  # [1] "AB CD" "ABCD"       ## (C)
  # [1] "ABCD"  "AB CD"      ## (en_GB.UTF-8)
  sort(c("AB CD","ABCD "))
  # [1] "AB CD" "ABCD "      ## (C)
  # [1] "AB CD" "ABCD "      ## (en_GB.UTF-8)

So the "C" ordering comes out as one would expect in either case,
while the "en_GB.UTF-8" ordering does not in the first case (where
the two strings are of different lengths).

Is there any way to extract the numerical encoding of a character
string (according to the collating locale encoding) to which the
comparison in the sort() algorithm is applied?

Ted.

On 28-May-10 11:07:57, Joris Meys wrote:
> Pretty obvious: You use different locales (collate). What happens if
> you use
> the same on both machines?
> 
> Cheers
> Joris
> 
> On Fri, May 28, 2010 at 10:17 AM, carslaw <david.carslaw at kcl.ac.uk>
> wrote:
>> Dear R users,
>>
>> I'm a bit perplexed with the effect sort has here, as it is different
>> on
>> ...
>>  the linux order is perhaps more intuitive.  However, the problem is
>>  the
>> order is inconsistent between
>>  the two systems.  Any suggestions?
>>
>> sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> x86_64-pc-linux-gnu
>>
>> locale:
>>  [1] LC_CTYPE=en_GB.utf8          LC_NUMERIC=C
>>  [3] LC_TIME=en_GB.utf8           LC_COLLATE=en_GB.utf8
>>  [5] LC_MONETARY=en_GB.utf8       LC_MESSAGES=en_GB.utf8
>>  [7] LC_PAPER=en_GB.utf8          LC_NAME=en_GB.utf8
>>  [9] LC_ADDRESS=en_GB.utf8        LC_TELEPHONE=en_GB.utf8
>> [11] LC_MEASUREMENT=en_GB.utf8    LC_IDENTIFICATION=en_GB.utf8
>> ...
>> > sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> x86_64-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252
>> [2] LC_CTYPE=English_United Kingdom.1252
>> [3] LC_MONETARY=English_United Kingdom.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United Kingdom.1252
>> ...
>> Dr David Carslaw
> -- 
> Joris Meys

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-10                                       Time: 12:49:19
------------------------------ XFMail ------------------------------



More information about the R-help mailing list