[R] Parsing regular expressions differently - feature request

David James daj025 at gmail.com
Wed Nov 19 04:21:26 CET 2008


Perhaps Python's raw strings is what Duncan TL  was referring to?
These are specified as r'Hello World' and their main
advantage is that backslashes are simply passed through.  From the
Python Language Reference:

"When an 'r' or 'R' prefix is present, a character following a
backslash is included in the string without change, and all
backslashes are left in the string. For example, the string literal
r"\n" consists of two characters: a backslash and a lowercase 'n'.
.... "

See http://docs.python.org/reference/lexical_analysis.html#id7 for more details.



On Tue, Nov 18, 2008 at 1:36 PM, William Dunlap <wdunlap at tibco.com> wrote:
> Duncan Murdoch murdoch at stats.uwo.ca Sat Nov 8 15:41:34 CET 2008
> wrote:
>> On 08/11/2008 7:20 AM, John Wiedenhoeft wrote:
>> > Hi there,
>> >
>> > I rejoiced when I realized that you can use Perl regex from within
> R. However,
>> > as the FAQ states "Some functions, particularly those involving
> regular
>> > expression matching, themselves use metacharacters, which may need
> to be
>> > escaped by the backslash mechanism. In those cases you may need a
> quadruple
>> > backslash to represent a single literal one. "
>> >
>> > I was wondering if that is really necessary for perl=TRUE? wouldn't
> it be
>> > possible to parse a string differently in a regex context, e.g.
> automatically
>> > insert \\ for each \ , such that you can use the perl syntax
> directly? For
>> > example, if you want to input a newline as a character, you would
> use \n
>> > anyway. At the moment one says \\n to make it clear to R that you
> mean \n to
>> > make clear that you mean newline... this is pretty annoying. How
> likely is it
>> > that you want to pass a real newline character to PCRE directly?
>> No, that's not possible.  At the level where the parsing takes place R
>> has no idea of its eventual use, so it can't tell that some strings
> are
>> going to be interpreted as Perl, and others not.
>> As Gabor mentioned, there have been various discussions of adding a
> new
>> syntax for strings that are parsed literally, without processing any
>> escapes, but no consensus on the right syntax to use.
>> ... [scan() example elided] ...
>> So I agree, it would be nice to have new syntax to allow this.  Last
>> time this came up, I argued for something like \verb in LaTeX where
> the
>> delimiter could be specified differently in each use.  Duncan TL
>> suggested triple quotes, as in Python.  I think now that triple quotes
>> would be be better than the particular form I suggested.
>> Duncan Murdoch
> Would a string with this alternate quoting be tagged (e.g., with a class
> that
> inherits from character) so that the deparser could display it in the
> style
> in which it was input?  Functions which generate file names using the
> native
> Windows notation would like to have them displayed without the extra
> backslashes.
> However, adding a new class for this could mess up other things.
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list