[Rd] gsub() hex character range problems in R-devel?

Martin Morgan mtmorg@n@b|oc @end|ng |rom gm@||@com
Tue Jan 4 20:35:30 CET 2022

I'm not very good at character encoding / etc so this might be user error. The following code is meant to replace extended ASCII characters, in particular a non-breaking space, with "", and it works in R-4-1-branch

> R.version.string
[1] "R version 4.1.2 Patched (2022-01-04 r81445)"
> gsub("[\x7f-\xff]", "", "fo\xa0o")
[1] "foo"

but fails in R-devel

> R.version.string
[1] "R Under development (unstable) (2022-01-04 r81445)"
> gsub("[\x7f-\xff]", "", "fo\xa0o")
Error in gsub("[\177-\xff]", "", "fo\xa0o") : invalid regular expression '[-�]', reason 'Invalid character range'
In addition: Warning message:
In gsub("[\177-\xff]", "", "fo\xa0o") :
  TRE pattern compilation error 'Invalid character range'

There are other oddities, too, like

> gsub("[[:alnum:]]", "", "fo\xa0o")  # R-4-1-branch
[1] "\xfc\xbe\x8c\x86\x84\xbc"

> gsub("[[:alnum:]]", "", "fo\xa0o")  # R-devel
[1] "<>"

The R-devel sessionInfo is

> sessionInfo()
R Under development (unstable) (2022-01-04 r81445)
Platform: x86_64-apple-darwin19.6.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Users/ma38727/bin/R-devel/lib/libRblas.dylib
LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.2.0

(I have built my own R on macOS; similar behavior is observed on a Linux machine)

Any hints welcome,

Martin Morgan

More information about the R-devel mailing list