[R] Regular expressions: offsets of groups

Gabor Grothendieck ggrothendieck at gmail.com
Mon Sep 27 19:29:18 CEST 2010


On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg
<malsburg at gmail.com> wrote:
> Dear list!
>
>> gregexpr("a+(b+)", "abcdaabbc")
> [[1]]
> [1] 1 5
> attr(,"match.length")
> [1] 2 4
>
> What I want is the offsets of the matches for the group (b+), i.e. 2
> and 7, not the offsets of the complete matches.  Is there a way in R
> to get that?
>
> I know about gsubgn and strapply, but they only give me the strings
> matched by groups not their offsets.
>
> I could write something myself that first takes the above matches
> ("ab" and "aabb") and then searches again using only the group (b+).
> For this to work, I'd have to parse the regular expression and search
> several times (> 2, for nested groups) instead of just once.  But I'm
> sure there is a better way to do this.
>

Try this zero width negative look behind expression:

> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE)
[[1]]
[1] 2 7
attr(,"match.length")
[1] 1 2

See ?regexp for more info.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list