[R] Regular expressions: offsets of groups

Michael Bedward michael.bedward at gmail.com
Tue Sep 28 09:46:15 CEST 2010


What Titus wants to do is akin to retrieving capturing groups from a
Matcher object in Java. I also thought there must be an existing,
elegant solution to this some time ago and searched for it, including
looking at the sources (albeit with not much expertise) but came up
blank.

I also looked at the stringr package (which is nice) but it doesn't
quite do it either.

Michael

On 28 September 2010 01:48, Titus von der Malsburg <malsburg at gmail.com> wrote:
> Dear list!
>
>> gregexpr("a+(b+)", "abcdaabbc")
> [[1]]
> [1] 1 5
> attr(,"match.length")
> [1] 2 4
>
> What I want is the offsets of the matches for the group (b+), i.e. 2
> and 7, not the offsets of the complete matches.  Is there a way in R
> to get that?
>
> I know about gsubgn and strapply, but they only give me the strings
> matched by groups not their offsets.
>
> I could write something myself that first takes the above matches
> ("ab" and "aabb") and then searches again using only the group (b+).
> For this to work, I'd have to parse the regular expression and search
> several times (> 2, for nested groups) instead of just once.  But I'm
> sure there is a better way to do this.
>
> Thanks for any suggestion!
>
>   Titus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list