[R] difference in sort order linux/Windows (R.2.11.0)

Duncan Murdoch murdoch.duncan at gmail.com
Fri May 28 12:47:44 CEST 2010


carslaw wrote:
> Dear R users,
>
> I'm a bit perplexed with the effect sort has here, as it is different on
> Windows vs. linux. 
> It makes my factor levels and subsequent plots different on the two systems.
>   

You are using different collation orders.  On Linux, your sessionInfo shows

en_GB.utf8   

while Windows shows

English_United Kingdom.1252


so you should be prepared for differences.  That said, it certainly 
looks as though the string comparison is wrong on Linux.  Using Ted 
Harding's examples, I get these results:

 > "AB CD" > "ABCD"
[1] FALSE
 > "AB CD" > "ABCD "
[1] FALSE

on Windows in the English_Canada.1252 locale and on Linux in the C 
locale.  However,  when I use the locale that's default on our system, 
en_US.UTF-8, I get

 > "AB CD" > "ABCD"
[1] TRUE
 > "AB CD" > "ABCD "
[1] FALSE

as Ted did, and that certainly looks wrong.

Duncan Murdoch
> Given:
>
> types <- c("PC-D-Euro-0", "PC-D-Euro-1", "PC-D-Euro-2", "PC-D-Euro-3", 
> "PC-D-Euro-4", "PC-D-Euro-5", "PC-D-Euro-6", "LCV-D-Euro-0", 
> "LCV-D-Euro-1", "LCV-D-Euro-2", "LCV-D-Euro-3", "LCV-D-Euro-4", 
> "LCV-D-Euro-5", "LCV-D-Euro-6", "HGV-D-Euro-0", "HGV-D-Euro-I", 
> "HGV-D-Euro-II", "HGV-D-Euro-III", "HGV-D-Euro-IV EGR", "HGV-D-Euro-IV SCR", 
> "HGV-D-Euro-IV SCRb", "HGV-D-Euro-V EGR", "HGV-D-Euro-V SCR", 
> "HGV-D-Euro-V SCRb", "HGV-D-Euro-VI", "HGV-D-Euro-VIb")
>
> On linux, sort does:
>
> sort(types)
>  [1] "HGV-D-Euro-0"       "HGV-D-Euro-I"       "HGV-D-Euro-II"     
>  [4] "HGV-D-Euro-III"     "HGV-D-Euro-IV EGR"  "HGV-D-Euro-IV SCR" 
>  [7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR"   "HGV-D-Euro-VI"     
> [10] "HGV-D-Euro-VIb"     "HGV-D-Euro-V SCR"   "HGV-D-Euro-V SCRb" 
> [13] "LCV-D-Euro-0"       "LCV-D-Euro-1"       "LCV-D-Euro-2"      
> [16] "LCV-D-Euro-3"       "LCV-D-Euro-4"       "LCV-D-Euro-5"      
> [19] "LCV-D-Euro-6"       "PC-D-Euro-0"        "PC-D-Euro-1"       
> [22] "PC-D-Euro-2"        "PC-D-Euro-3"        "PC-D-Euro-4"       
> [25] "PC-D-Euro-5"        "PC-D-Euro-6"
>
>
> And on Windows:
>
> sort(types)
>
>  [1] "HGV-D-Euro-0"       "HGV-D-Euro-I"       "HGV-D-Euro-II"    
>  [4] "HGV-D-Euro-III"     "HGV-D-Euro-IV EGR"  "HGV-D-Euro-IV SCR"
>  [7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR"   "HGV-D-Euro-V SCR" 
> [10] "HGV-D-Euro-V SCRb"  "HGV-D-Euro-VI"      "HGV-D-Euro-VIb"   
> [13] "LCV-D-Euro-0"       "LCV-D-Euro-1"       "LCV-D-Euro-2"     
> [16] "LCV-D-Euro-3"       "LCV-D-Euro-4"       "LCV-D-Euro-5"     
> [19] "LCV-D-Euro-6"       "PC-D-Euro-0"        "PC-D-Euro-1"      
> [22] "PC-D-Euro-2"        "PC-D-Euro-3"        "PC-D-Euro-4"      
> [25] "PC-D-Euro-5"        "PC-D-Euro-6"      
>
> Session info for both systems is below.  The order I actually want is the
> Windows one, but looking at it,
>  the linux order is perhaps more intuitive.  However, the problem is the
> order is inconsistent between
>  the two systems.  Any suggestions?
>
> sessionInfo()
> R version 2.11.0 (2010-04-22) 
> x86_64-pc-linux-gnu 
>
> locale:
>  [1] LC_CTYPE=en_GB.utf8          LC_NUMERIC=C                
>  [3] LC_TIME=en_GB.utf8           LC_COLLATE=en_GB.utf8       
>  [5] LC_MONETARY=en_GB.utf8       LC_MESSAGES=en_GB.utf8      
>  [7] LC_PAPER=en_GB.utf8          LC_NAME=en_GB.utf8          
>  [9] LC_ADDRESS=en_GB.utf8        LC_TELEPHONE=en_GB.utf8     
> [11] LC_MEASUREMENT=en_GB.utf8    LC_IDENTIFICATION=en_GB.utf8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
>
> other attached packages:
> [1] rkward_0.5.3
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
>   
>> sessionInfo()
>>     
> R version 2.11.0 (2010-04-22)
> x86_64-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252
> [2] LC_CTYPE=English_United Kingdom.1252  
> [3] LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C                          
> [5] LC_TIME=English_United Kingdom.1252   
>
>  
> attached base packages:
>
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> Dr David Carslaw
> King's College London
> Environmental Research Group
> Franklin Wilkins Building
> 150 Stamford Street
> London
> SE1 9NH 
>



More information about the R-help mailing list