[R] element wise pattern recognition and string substitution

Bert Gunter bgunter.4567 at gmail.com
Mon Sep 5 18:01:12 CEST 2016


Jeff:

It is not obvious to me that the ability to *match* an arbitrary
pattern (including one of several different ones via "|" , per the
link you included) implies that sub() and friends can extract it, e.g.
via the /N construct or otherwise.  I would appreciate it if you or
someone else could show me how this can be done.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 5, 2016 at 8:37 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
> Yes, sorry I did not look closer... regex can match any finite language, so there are no data sets you can feed to R that cannot be matched. [1] You may find it hard to see the pattern, or you may want to build the pattern programmatically to alleviate tedium for yourself, but regexes are not the constraint.
>
> http://www.cs.nuim.ie/~jpower/Courses/Previous/parsing/node18.html
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 4, 2016 10:41:45 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>Well, he did provide an example, and...
>>
>>
>>> z <- c('TX.WT.CUT.mean','mg.tx.cv')
>>
>>> sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
>>[1] "WT.CUT" "tx"
>>
>>
>>## seems to do what was requested.
>>
>>Jeff would have to amplify on his initial statement however: do you
>>mean that separate patterns can always be combined via "|" ?  Or
>>something deeper?
>>
>>Cheers,
>>Bert
>>Bert Gunter
>>
>>"The trouble with having an open mind is that people keep coming along
>>and sticking things into it."
>>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>>On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller
>><jdnewmil at dcn.davis.ca.us> wrote:
>>> Your opening assertion is false.
>>>
>>> Provide a reproducible example and someone will demonstrate.
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On September 4, 2016 9:06:59 PM PDT, Jun Shen <jun.shen.ut at gmail.com>
>>wrote:
>>>>Dear list,
>>>>
>>>>I have a vector of strings that cannot be described by one pattern.
>>So
>>>>let's say I construct a vector of patterns in the same length as the
>>>>vector
>>>>of strings, can I do the element wise pattern recognition and string
>>>>substitution.
>>>>
>>>>For example,
>>>>
>>>>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
>>>>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)"
>>>>
>>>>patterns <- c(pattern1,pattern2)
>>>>strings <- c('TX.WT.CUT.mean','mg.tx.cv')
>>>>
>>>>Say I want to extract "WT.CUT" from the first string and "tx" from
>>the
>>>>second string. If I do
>>>>
>>>>sub(patterns, '\\2', strings), only the first pattern will be used.
>>>>
>>>>looping the patterns doesn't work the way I want. Appreciate any
>>>>comments.
>>>>Thanks.
>>>>
>>>>Jun
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide
>>>>http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list