[Rd] sorting bug in R-devel?

Thierry Onkelinx th|erry@onke||nx @end|ng |rom |nbo@be
Tue Jan 19 10:10:20 CET 2021


Dear all,

My git2rdata package relies on a stable sorting. I've noticed that
some characters get a different position under R-devel under Windows
10. This is why the unit test of my package only fail in this
combination (https://cran.r-project.org/web/checks/check_results_git2rdata.html)

Below is a minimal example to illustrate the problem.

Best regards,

Thierry

data <- readLines("https://raw.githubusercontent.com/ropensci/git2rdata/master/tests/testthat/test_b_special.R",
encoding = "UTF-8", n = 15)
eval(parse(text = paste(tail(data, -3), collapse = "")))
ds$a <- enc2utf8(ds$a)
print(ds$a) # input
Sys.setlocale(locale = "C")
print(sort(ds$a)) # sorted
print(order(ds$a)) # order
print(sessionInfo())

# input
## Win 10 R 4.0.2
 [1] "a"       "a b"     "a\tb"     "a\tb\tc"   "\ta"      "a\t"      "a\nb"
 [8] "a\nb\nc" "\na"     "a\n"     "a\"b"    "a\"b\"c" "\"b"     "a\""
[15] "\"b\""   "a'b"     "a'b'c"   "'b"      "a'"      "'b'"     "a b c"
[22] "\"NA\""  "'NA'"    NA        "é"       "&"       "à"       "µ"
[29] "ç"       "\200"       "|"       "#"       "@"       "$"
## Win 10 R devel
 [1] "a"       "a b"     "a\tb"     "a\tb\tc"   "\ta"      "a\t"      "a\nb"
 [8] "a\nb\nc" "\na"     "a\n"     "a\"b"    "a\"b\"c" "\"b"     "a\""
[15] "\"b\""   "a'b"     "a'b'c"   "'b"      "a'"      "'b'"     "a b c"
[22] "\"NA\""  "'NA'"    NA        "é"       "&"       "à"       "µ"
[29] "ç"       "\200"       "|"       "#"       "@"       "$"
## Ubuntu 18.04 R 4.0.3
 [1] "a"       "a b"     "a\tb"    "a\tb\tc" "\ta"     "a\t"     "a\nb"
 [8] "a\nb\nc" "\na"     "a\n"     "a\"b"    "a\"b\"c" "\"b"     "a\""
[15] "\"b\""   "a'b"     "a'b'c"   "'b"      "a'"      "'b'"     "a b c"
[22] "\"NA\""  "'NA'"    NA        "é"       "&"       "à"       "µ"
[29] "ç"       "€"       "|"       "#"       "@"       "$"

# sorted
## Win 10 R 4.0.2
 [1] "\ta"     "\na"     "\"NA\""  "\"b"     "\"b\""   "#"       "$"
 [8] "&"       "'NA'"    "'b"      "'b'"     "<U+00B5>" "<U+00E0>" "<U+00E7>"
[15] "<U+00E9>" "<U+20AC>" "@"       "a"       "a\t"     "a\tb"    "a\tb\tc"
[22] "a\n"     "a\nb"    "a\nb\nc" "a b"     "a b c"   "a\""     "a\"b"
[29] "a\"b\"c" "a'"      "a'b"     "a'b'c"   "|"
## Win 10 R devel
 [1] "\ta"     "\na"     "\"NA\""  "\"b"     "\"b\""   "#"       "$"
 [8] "&"       "'NA'"    "'b"      "'b'"     "@"       "a"       "a\t"
[15] "a\tb"    "a\tb\tc" "a\n"     "a\nb"    "a\nb\nc" "a b"     "a b c"
[22] "a\""     "a\"b"    "a\"b\"c" "a'"      "a'b"     "a'b'c"   "|"
[29] "\200"       "\265"       "\340"       "\347"       "\351"
## Ubuntu 18.04 R 4.0.3
 [1] "\ta"     "\na"     "\"NA\""  "\"b"     "\"b\""   "#"       "$"
 [8] "&"       "'NA'"    "'b"      "'b'"     "<U+00B5>" "<U+00E0>" "<U+00E7>"
[15] "<U+00E9>" "<U+20AC>" "@"       "a"       "a\t"     "a\tb"    "a\tb\tc"
[22] "a\n"     "a\nb"    "a\nb\nc" "a b"     "a b c"   "a\""     "a\"b"
[29] "a\"b\"c" "a'"      "a'b"     "a'b'c"   "|"

# order
## Win 10 R 4.0.2
 [1]  5  9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33  1  6  3  4 10  7  8  2
[26] 21 14 11 12 19 16 17 31 24
## Win 10 R devel
 [1]  5  9 22 13 15 32 34 26 23 18 20 33  1  6  3  4 10  7  8  2 21 14 11 12 19
[26] 16 17 31 30 28 27 29 25 24
## Ubuntu 18.04 R 4.0.3
 [1]  5  9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33  1  6  3  4 10  7  8  2
[26] 21 14 11 12 19 16 17 31 24

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] C
system code page: 1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.2 fortunes_1.5-4

R Under development (unstable) (2021-01-13 r79826)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.0

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=C                 LC_NUMERIC=C
 [3] LC_TIME=C                  LC_COLLATE=C
 [5] LC_MONETARY=C              LC_MESSAGES=nl_BE.UTF-8
 [7] LC_PAPER=nl_BE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.3 fortunes_1.5-4


ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey



More information about the R-devel mailing list