[R] Regex for ^ (the caret symbol)?

Rui Barradas ruipbarradas at sapo.pt
Mon Jan 21 23:25:02 CET 2013


Hello,

Em 21-01-2013 20:52, Duncan Murdoch escreveu:
> On 13-01-21 3:20 PM, Jeff Newmiller wrote:
>> Apparently Extended RegExp syntax eliminated the
>> "^-is-an-ordinary-character-except-for-two-uses" meaning that I am
>> familiar with from the Basic RegExp usage, since GNU grep with the -e
>> option also refuses to match the carat unless it is escaped. The TRE
>> library treats BRE as obsolete, so we only get ERE and Perl regexes in
>> R. So I guess it isn't a bug, but is rather a "feature".
>
> I re-read the ?regex help page, and I think it does actually say this,
> so we don't even have a documentation error as I thought before.  When
> it is saying that ^ is a plain character except when it comes first, it
> is talking about first within a character class, e.g. [a^] meaning "a"
> or "^" as opposed to [^a] meaning "not a".

So in the pattern [a^] it doesn't need to be escaped.

grep("[a^]", c("a^", "and", "b", "^")) # 1 2 4


Rui Barradas
>
> Duncan Murdoch
>
>
> ---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go
>> Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>> Go...
>>                                        Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.
>> rocks...1k
>> ---------------------------------------------------------------------------
>>
>> Sent from my phone. Please excuse my brevity.
>>
>> Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>>
>>> On 13-01-21 1:05 PM, Jeff Newmiller wrote:
>>>> So what is the special behavior of the ^ symbol when not at the
>>> beginning of the string that occurs when it is not escaped?
>>>
>>> I think it retains its meaning as an assertion that it occurs at the
>>> beginning of the line, and so a pattern like "a^b" could never match
>>> anything.  It's not very useful in this context, but I expect it's
>>> easier to implement in the case of complicated patterns, where some
>>> paths through the pattern put it at the beginning and others don't,
>>> e.g.
>>>
>>> (a|)^b
>>>
>>> has two possible patterns:  a^b and ^b.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>> ---------------------------------------------------------------------------
>>>
>>>> Jeff Newmiller                        The     .....       .....  Go
>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>> Go...
>>>>                                         Live:   OO#.. Dead: OO#..
>>> Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>> rocks...1k
>>>>
>>> ---------------------------------------------------------------------------
>>>
>>>> Sent from my phone. Please excuse my brevity.
>>>>
>>>> Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>>>>
>>>>> On 13-01-21 11:48 AM, Jeff Newmiller wrote:
>>>>>> I am not sure I understand what worked perfectly, since it is my
>>>>> understanding that ^ is only special at the beginning of the regex
>>> (to
>>>>> anchor the pattern at the beginning of the target string) or as the
>>>>> first character of a character set (to indicate exclusion of the
>>> listed
>>>>> characters). In any other position the caret should behave like an
>>>>> ordinary character. That is, your original pattern should have
>>> worked
>>>>> as-is. This is supported by the help page documentation for regex in
>>>>> the paragraph below the definition of [:xdigit:]. I think this is a
>>> bug
>>>>> in R.
>>>>>
>>>>> It's a documentation error rather than a bug.  The ^ character is
>>>>> special anywhere in the extended RE syntax defined by the TRE
>>> library
>>>>> or the Perl-compatible library that we use.  This is inconsistent
>>> with
>>>>> the POSIX standard, which might be what you were thinking of.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>
>>> ---------------------------------------------------------------------------
>>>
>>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>>> Live...
>>>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.
>>> Live
>>>>> Go...
>>>>>>                                          Live:   OO#.. Dead: OO#..
>>>>> Playing
>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>>> with
>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>> rocks...1k
>>>>>>
>>>>>
>>> ---------------------------------------------------------------------------
>>>
>>>>>> Sent from my phone. Please excuse my brevity.
>>>>>>
>>>>>> mtb954 at gmail.com wrote:
>>>>>>
>>>>>>> Hi Tsjerk, many thanks...that worked perfectly!
>>>>>>>
>>>>>>> Mark Na
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 21, 2013 at 9:36 AM, Tsjerk Wassenaar
>>>>> <tsjerkw at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Oh, I'm jetlagged. ^ is a control character for 'start of
>>> string'.
>>>>> In
>>>>>>> the
>>>>>>>> context of a character set it means negation: [^a-z].
>>>>>>>>
>>>>>>>> Ciao,
>>>>>>>>
>>>>>>>> Tsjerk
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 21, 2013 at 4:33 PM, Tsjerk Wassenaar
>>>>>>> <tsjerkw at gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi Mark Na,
>>>>>>>>>
>>>>>>>>> Try:
>>>>>>>>>
>>>>>>>>> grepl("latitude\\^2",temp)
>>>>>>>>>
>>>>>>>>> ^ is a control character for negation, so you have to escape it.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Tsjerk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 21, 2013 at 4:26 PM, <mtb954 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello R-helpers,
>>>>>>>>>>
>>>>>>>>>> I am trying to search for string that includes the caret
>>> symbol,
>>>>>>> using
>>>>>>>>>> the
>>>>>>>>>> following code:
>>>>>>>>>>
>>>>>>>>>> grepl("latitude^2",temp)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And R doesn't like that. It gives me:
>>>>>>>>>>
>>>>>>>>>>> temp<-c("latitude^2","latitude and
>>>>>>> latitude^2","longitude^2","longitude
>>>>>>>>>> and longitude^2")
>>>>>>>>>>> temp
>>>>>>>>>> [1] "latitude^2"                "latitude and latitude^2"
>>>>>>> "longitude^2"
>>>>>>>>>>                "longitude and longitude^2"
>>>>>>>>>>> grepl("latitude^2",temp)
>>>>>>>>>> [1] FALSE FALSE FALSE FALSE
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think this must a regex problem, but I can't find out to
>>>>> specify
>>>>>>> the
>>>>>>>>>> caret using regex.
>>>>>>>>>>
>>>>>>>>>> I would appreciate any help you could provide.
>>>>>>>>>>
>>>>>>>>>> Many thanks,
>>>>>>>>>>
>>>>>>>>>> Mark Na
>>>>>>>>>>
>>>>>>>>>>            [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> R-help at r-project.org mailing list
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>> code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>>>>>
>>>>>>>>> post-doctoral researcher
>>>>>>>>> Biocomputing Group
>>>>>>>>> Department of Biological Sciences
>>>>>>>>> 2500 University Drive NW
>>>>>>>>> Calgary, AB T2N 1N4
>>>>>>>>> Canada
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>>>>
>>>>>>>> post-doctoral researcher
>>>>>>>> Biocomputing Group
>>>>>>>> Department of Biological Sciences
>>>>>>>> 2500 University Drive NW
>>>>>>>> Calgary, AB T2N 1N4
>>>>>>>> Canada
>>>>>>>>
>>>>>>>
>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list