[R] element wise pattern recognition and string substitution

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Mon Sep 5 17:37:08 CEST 2016


Yes, sorry I did not look closer... regex can match any finite language, so there are no data sets you can feed to R that cannot be matched. [1] You may find it hard to see the pattern, or you may want to build the pattern programmatically to alleviate tedium for yourself, but regexes are not the constraint. 

http://www.cs.nuim.ie/~jpower/Courses/Previous/parsing/node18.html
-- 
Sent from my phone. Please excuse my brevity.

On September 4, 2016 10:41:45 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>Well, he did provide an example, and...
>
>
>> z <- c('TX.WT.CUT.mean','mg.tx.cv')
>
>> sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
>[1] "WT.CUT" "tx"
>
>
>## seems to do what was requested.
>
>Jeff would have to amplify on his initial statement however: do you
>mean that separate patterns can always be combined via "|" ?  Or
>something deeper?
>
>Cheers,
>Bert
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller
><jdnewmil at dcn.davis.ca.us> wrote:
>> Your opening assertion is false.
>>
>> Provide a reproducible example and someone will demonstrate.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 4, 2016 9:06:59 PM PDT, Jun Shen <jun.shen.ut at gmail.com>
>wrote:
>>>Dear list,
>>>
>>>I have a vector of strings that cannot be described by one pattern.
>So
>>>let's say I construct a vector of patterns in the same length as the
>>>vector
>>>of strings, can I do the element wise pattern recognition and string
>>>substitution.
>>>
>>>For example,
>>>
>>>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
>>>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)"
>>>
>>>patterns <- c(pattern1,pattern2)
>>>strings <- c('TX.WT.CUT.mean','mg.tx.cv')
>>>
>>>Say I want to extract "WT.CUT" from the first string and "tx" from
>the
>>>second string. If I do
>>>
>>>sub(patterns, '\\2', strings), only the first pattern will be used.
>>>
>>>looping the patterns doesn't work the way I want. Appreciate any
>>>comments.
>>>Thanks.
>>>
>>>Jun
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>>______________________________________________
>>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list