[R] platform dependent regex

Ista Zahn istazahn at gmail.com
Tue Feb 9 17:55:33 CET 2016


I just spent a day and a half debugging someone's code, only to
discover that the problem is platform dependent regular expressions.
For example:

## Windows:
grepl("\\W", "", "س")  # TRUE

## OS X:
grepl("\\W", "", "س")  # TRUE

## Linux:
grepl("\\W", "", "س")  # FALSE

Ouch. The documentation does say "Certain named classes of characters
are predefined.  Their interpretation depends on the _locale_", but
that doesn't seem to cover it given that the locale on OS X and Linux
was the same (en_US.UTF-8).

Question: Is this considered a bug, and if so what can I do to help
fix it? I've checked and the issue is present in both r-patched and
r-devel.

Best,
Ista



More information about the R-help mailing list