[R] element wise pattern recognition and string substitution

Ista Zahn istazahn at gmail.com
Wed Sep 7 15:30:22 CEST 2016


On Mon, Sep 5, 2016 at 12:56 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:
> Thanks for the reply, Bert.
>
> Your solution solves the example. I actually have a more general situation
> where I have this dot concatenated string from multiple variables. The
> problem is those variables may have values with dots in there.

If you concatenated the variables yourself you could go back a step
and use another separator, i.e., one that doesn't appear in the
original variables. The separator does not need to be a single
character, e.g., "__.__" would be fine. This will make later parsing
with regular expressions much easier.

The number
> of dots are not consistent for all values of a variable. So I am thinking
> to define a vector of patterns for the vector of the string and hopefully
> to find a way to use a pattern from the pattern vector for each value of
> the string vector. The only way I can think of is "for" loop, which can be
> slow. Also these are happening in a function I am writing. Just wonder if
> there is another more efficient way. Thanks a lot.
>
> Jun
>
> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>
>> Well, he did provide an example, and...
>>
>>
>> > z <- c('TX.WT.CUT.mean','mg.tx.cv')
>>
>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
>> [1] "WT.CUT" "tx"
>>
>>
>> ## seems to do what was requested.
>>
>> Jeff would have to amplify on his initial statement however: do you
>> mean that separate patterns can always be combined via "|" ?  Or
>> something deeper?
>>
>> Cheers,
>> Bert
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
>> wrote:
>> > Your opening assertion is false.
>> >
>> > Provide a reproducible example and someone will demonstrate.
>> > --
>> > Sent from my phone. Please excuse my brevity.
>> >
>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen <jun.shen.ut at gmail.com>
>> wrote:
>> >>Dear list,
>> >>
>> >>I have a vector of strings that cannot be described by one pattern. So
>> >>let's say I construct a vector of patterns in the same length as the
>> >>vector
>> >>of strings, can I do the element wise pattern recognition and string
>> >>substitution.
>> >>
>> >>For example,
>> >>
>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)"
>> >>
>> >>patterns <- c(pattern1,pattern2)
>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv')
>> >>
>> >>Say I want to extract "WT.CUT" from the first string and "tx" from the
>> >>second string. If I do
>> >>
>> >>sub(patterns, '\\2', strings), only the first pattern will be used.
>> >>
>> >>looping the patterns doesn't work the way I want. Appreciate any
>> >>comments.
>> >>Thanks.
>> >>
>> >>Jun
>> >>
>> >>       [[alternative HTML version deleted]]
>> >>
>> >>______________________________________________
>> >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>https://stat.ethz.ch/mailman/listinfo/r-help
>> >>PLEASE do read the posting guide
>> >>http://www.R-project.org/posting-guide.html
>> >>and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list