[R] Regex: workaround for variable length negative lookbehind

Gabor Grothendieck ggrothendieck at gmail.com
Sun Nov 30 21:37:59 CET 2008


Here is a very slight further simplification, i.e. we can drop the final {1,}

> grep("^(?!(.)\\1{1,}$).*(.)\\2$", vec, perl = TRUE)
[1] 2 3 5


On Sun, Nov 30, 2008 at 3:26 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Try this:
>
>> vec <- c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")
>
>> grep("^(?!(.)\\1{1,}$).*(.)\\2{1,}$", vec, perl = TRUE)
> [1] 2 3 5
>
> The (?...) succeeds only if the string is not all the same
> character and since that consumes no characters it
> restarts at the beginning to match anything followed
> by repeated characters to the end.
>
> On Sun, Nov 30, 2008 at 2:33 PM, Stefan Th. Gries <stgries at gmail.com> wrote:
>> Hi all
>>
>> I have the following regular expression problem: I want to find
>> complete elements of a vector that end in a repeated character but
>> where the repetition doesn't make up the whole word. That is, for the
>> vector vec:
>>
>> vec<-c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")
>>
>> I would like to get
>> "baaa"
>> "bbaa"
>> "baamm"
>>
>> >From tools where negative lookbehind can involve variable lengths, one
>> would think this would work:
>>
>> grep("(?<!(?:\\1|^))(.)\\1{1,}$", vec, perl=T)
>>
>> But then R doesn't like it that much ... I also know I can get it like this:
>>
>> whole.word.rep <- grep("^(.)\\1{1,}$", vec, perl=T) # 1 6
>> rep.at.end <- grep("(.)\\1{1,}$", vec, perl=T) # 1 2 3 5 6
>> setdiff(rep.at.end, whole.word.rep) # 2 3 5
>>
>> But is there a one-line grep thingy to do this?
>>
>> Thx for any pointers,
>> STG
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list