[Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings

Kirill Müller kirill.mueller at ivt.baug.ethz.ch
Mon May 9 16:07:21 CEST 2016


Hi


I think the following behavior is a regression from R 3.2.5:

 > match(iconv(  c("\u00f8", "A"), from = "UTF8", to  = "latin1" ), 
"\u00f8")
[1]  1 NA
 > match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8")
[1] NA
 > match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8", 
incomparables = NA)
[1] 1

I'm seeing this in R 3.3.0 on both Windows and Ubuntu 15.10.

The specific behavior makes me think this is related to the following 
NEWS entry:

match(x, table) is faster (sometimes by an order of magnitude) when x is 
of length one and incomparables is unchanged (PR#16491).


Best regards

Kirill



More information about the R-devel mailing list