[R] Parsing regular expressions differently - feature request

Duncan Murdoch murdoch at stats.uwo.ca
Sat Nov 8 22:28:34 CET 2008


On 08/11/2008 3:16 PM, Wacek Kusnierczyk wrote:
> Duncan Murdoch wrote:
>> On 08/11/2008 11:03 AM, Gabor Grothendieck wrote:
>>> On Sat, Nov 8, 2008 at 9:41 AM, Duncan Murdoch <murdoch at stats.uwo.ca>
>>> wrote:
>>>> On 08/11/2008 7:20 AM, John Wiedenhoeft wrote:
>>>>> Hi there,
>>>>>
>>>>> I rejoiced when I realized that you can use Perl regex from within R.
>>>>> However, as the FAQ states "Some functions, particularly those
>>>>> involving
>>>>> regular expression matching, themselves use metacharacters, which
>>>>> may need
>>>>> to be escaped by the backslash mechanism. In those cases you may
>>>>> need a
>>>>> quadruple backslash to represent a single literal one. "
>>>>>
>>>>> I was wondering if that is really necessary for perl=TRUE? wouldn't
>>>>> it be
>>>>> possible to parse a string differently in a regex context, e.g.
>>>>> automatically insert \\ for each \ , such that you can use the perl
>>>>> syntax
>>>>> directly? For example, if you want to input a newline as a
>>>>> character, you
>>>>> would use \n anyway. At the moment one says \\n to make it clear to
>>>>> R that
>>>>> you mean \n to make clear that you mean newline... this is pretty
>>>>> annoying.
>>>>> How likely is it that you want to pass a real newline character to
>>>>> PCRE
>>>>> directly?
>>>> No, that's not possible.  At the level where the parsing takes place
>>>> R has
>>>> no idea of its eventual use, so it can't tell that some strings are
>>>> going to
>>>> be interpreted as Perl, and others not.
> Here's a quick hack to achieve the impossible:

That might solve John's problem, but I doubt it.  As far as I can see it 
won't handle \L, for example.

Duncan Murdoch

> 
> mygrep = function(pattern, text, perl=FALSE, ...) {
>    if (perl) pattern = gsub("\\\\", "\\\\\\\\", pattern)
>    grep(pattern, text, perl=perl, ...)
> }
> 
> (text = "lemme \\ it")
> # [1] "lemme \\ it"
> 
> nchar(text)
> # [1] 10
> 
> (pattern = "\\")
> # [1] "\\"
> nchar(pattern)
> # [1] 1
> 
> grep(pattern, text, perl=TRUE)
> # can't go, impossible!
> 
> mygrep(pattern, text, perl=TRUE, value=TRUE)
> # [1] "lemme \\ it"
> 
> vQ
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list