[R] Regex: workaround for variable length negative lookbehind

Stefan Evert stefan.evert at uos.de
Sun Nov 30 20:59:54 CET 2008


Hi Stefan! :-)

>> From tools where negative lookbehind can involve variable lengths,  
>> one
> would think this would work:
> grep("(?<!(?:\\1|^))(.)\\1{1,}$", vec, perl=T)
> But then R doesn't like it that much ...

It's really the PCRE library that doesn't like your regexp, not R.   
The problem is that negative behind is only possible with a fixed- 
length expression, and since \1 may hold an arbitrary string, the PCRE  
library can't be sure it's just a single character.  I'm also  
surprised that you're allowed to use \1 before defining it.
> But is there a one-line grep thingy to do this?

Can't think of a one-liner, but a three-line solution you can easily  
enough wrap in a small function:

vec<-c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")
idx.1 <- grep("(.)\\1$", vec)
idx.2 <- grep("^(.)\\1*$", vec)
vec[setdiff(idx.1, idx.2)]


The wonders of Googleology (episode 1)

"from collectibles to cars"
	84,700,000 -- Google
	9,443,672 -- Google N-grams (Web 1T5)
	1 -- ukWaC

[ stefan.evert at uos.de | http://purl.org/stefan.evert ]

More information about the R-help mailing list