[Rd] best way to write tests when sort() evaluates differently in R CMD check due to LC_COLLATE locale setting?

Skye Bender-deMoll skyebend at skyeome.net
Mon Apr 14 23:36:51 CEST 2014


Dear R devel,

What is the correct way to write package tests that could possibly fail 
due to locale collation behavior?  Is it safe/proper for me to call 
Sys.setlocale("LC_COLLATE", "en_US.UTF-8")  in each test file? Or should 
I explicitly force collation to C before writing tests?  Or do I need to 
always call sort() on my comparison objects to ensure they are sorted in 
the same locale-specific way?

I'd had a strange situation where a package test I'm writing fails R CMD 
check, but runs fine in the R terminal.  I eventually got to the point 
where I can see that in R CMD check, the vector I'm comparing to 
evaluate the test result did not seem to be sorted as requested. Further 
digging revealed that the locale's LC_COLLATE value is set to 'C' in R 
CMD check while it is "en_US.UTF-8" in my R terminal.

Now that I know what to look for in the documentation, I realize that 
this is a feature. p.36 of "Writing R Extensions" states:

"All these tests are run with collation set to the C
locale, and for the examples and tests with environment variable
LANGUAGE=en: this is to minimize differences between platforms. "

It appears that this impacts the sort order of capital letters

 > Sys.setlocale("LC_COLLATE", "C")
[1] "C"
 > sort(c("a",'A','b','c'))
[1] "A" "a" "b" "c"
 > Sys.setlocale("LC_COLLATE", "en_US.UTF-8")
[1] "en_US.UTF-8"
 > sort(c("a",'A','b','c'))
[1] "a" "A" "b" "c"

best,
  -skye



More information about the R-devel mailing list