[Rd] Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

Winston Chang winstonchang1 at gmail.com
Mon Mar 2 20:14:18 CET 2015


On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings.
Here's an example (must be run on Windows to reproduce the error):

Sys.setlocale("LC_CTYPE", "chinese")
y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97)))
Encoding(y) <- "UTF-8"
y
# [1] "渗"
grep("\n", y, fixed = TRUE)
# Error in grep("\n", y, fixed = TRUE) : invalid multibyte string at '<97>'


In my particular case, I'm using parse() on a string that contains
characters like this, and it triggers the same error, because parse()
calls srcfilecopy(), which calls grepl():

parse(text=y)
# Error in grepl("\n", lines, fixed = TRUE) :
#   invalid multibyte string at '<97>'


Am I right in assuming that this isn't the expected behavior?

-Winston



More information about the R-devel mailing list