[R] Inconsistent alphabetisation issue

Stefano Conti s.conti at gmx.co.uk
Fri May 23 13:00:05 CEST 2014


   Dear R users community,
   For some time now I have occasionally observed some inconsistent behaviour
   across identical (i.e. same 3.1.0 version and set-up / configuration) R
   installations on separate Linux machines (all manufactured in the UK).

   Specifically,  after  reading  (via 'read.table' or its flavours) some
   data-frames and then tabulate its factors, I notice that the levels of some
   factors are by default alphabetised differently between different machines.

   As an example, on 2 separate work I obtain from a given data-frame (say
   'tbl') before applying any processing the same output

   > tbl <- read.csv(path.expand("~/tmp/tbl.csv"), header=TRUE)
   > levels(tbl$Ethnicity)
    [1] "Black-African"               "Black-Caribbean"
    [3] "Black other"                   "Indian/Pakistani/Bangladeshi"
    [5] "Not Known"                    "Other Asian/Oriental"
    [7] "Other/Mixed"                 "White"
    [9] "Black Other"                  "Not known"

   whereas reproducing the same code and instructions on my personal laptop
   yields the following:

   > tbl <- read.csv(path.expand("~/tmp/tbl.csv"), header=TRUE)
   > levels(tbl$Ethnicity)
    [1] "Black other"                      "Black-African"
    [3] "Black-Caribbean"               "Indian/Pakistani/Bangladeshi"
    [5]      "Not      known"                                      "Other
   Asian/Oriental"
    [7] "Other/Mixed"                     "White"
    [9] "Black Other"                      "Not known"

   I've  tried  looking  up  on  the  R mailing list, as well as on the R
   documentation  and on Stack Overflow, what could the source of, and in
   particular a solution to, this discrepant behaviour; unfortunately, apart
   from some hint to localisation issues -- which I can't see how they'd apply
   in my case -- couldn't find anything pertinent.

   Many thanks in advance for any help / insight you may have to provide on
   this!
   --
   Dr Stefano Conti


More information about the R-help mailing list