[Rd] base::order breaking change in R-devel

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Tue Jun 23 11:23:50 CEST 2020


This can be narrowed down to

Sys.setlocale("LC_CTYPE","C")
x2 <- "\u00e7"
x1 <- iconv(x2, from="UTF-8", to="latin1")
x1 < x2 # FALSE or NA

In R 4.0 it returns NA, in R-devel it returns FALSE (when running in 
CP1252 locale on Windows).

It is the same character, only the encoding is different, so the R-devel 
return value is correct and the previous behavior was a bug. It should 
not matter what is the current native encoding when doing the 
comparison. Also, the collation order should only apply after characters 
are converted to a common encoding, when the encoding is known, so in 
this case the collation order of the locale should not have an impact, 
and it seems it doesn't. I don't think R should preserve 
bug-compatibility in this case, code depending on this buggy behavior 
should be fixed.

I don't see immediately which NEWS entry this corresponds to. Please 
keep in mind that NEWS don't cover all changes, for that you need to 
look at the svn commits, and even then it may be hard to track down 
concrete changes in behavior to the commits, to do that you need to 
debug the code or bisect.

Changes to _documented_ behavior should be more visible and of course 
reflected by changes in the documentation, if not, it is a bug worth 
reporting,  and the report should come with a reference to concrete 
parts of the documentation that is violated.

Best
Tomas

On 5/23/20 12:03 PM, Jan Gorecki wrote:
> Hi R developers,
> There seems to be breaking change in base::order on Windows in
> R-devel. Code below yields different results on R 4.0.0 and R-devel
> (2020-05-22 r78545). I haven't found any info about that change in
> NEWS. Was the change intentional?
>
> Sys.setlocale("LC_CTYPE","C")
> Sys.setlocale("LC_COLLATE","C")
> x1 = "fa\xE7ile"
> Encoding(x1) = "latin1"
> x2 = iconv(x1, "latin1", "UTF-8")
> base::order(c(x2,x1,x1,x2))
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
>
> # R 4.0.0
> base::order(c(x2,x1,x1,x2))
> #[1] 1 4 2 3
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
> #[1] 2 3 1 4
>
> # R-devel
> base::order(c(x2,x1,x1,x2))
> #[1] 1 2 3 4
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
> #[1] 1 4 2 3
>
> Best Regards,
> Jan Gorecki
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list