[R] Regular expressions: offsets of groups

jim holtman jholtman at gmail.com
Mon Sep 27 18:43:55 CEST 2010


try this:

> x <-  gregexpr("a+(b+)", "abcdaabbcaaacaaab")
> justA <-  gregexpr("a+", "abcdaabbcaaacaaab")
> # find matches in 'x' for 'justA'
> indx <- which(justA[[1]] %in% x[[1]])
> # now determine where 'b' starts
> justA[[1]][indx] + attr(justA[[1]], 'match.length')[indx]
[1]  2  7 17
>


On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg
<malsburg at gmail.com> wrote:
> Dear list!
>
>> gregexpr("a+(b+)", "abcdaabbc")
> [[1]]
> [1] 1 5
> attr(,"match.length")
> [1] 2 4
>
> What I want is the offsets of the matches for the group (b+), i.e. 2
> and 7, not the offsets of the complete matches.  Is there a way in R
> to get that?
>
> I know about gsubgn and strapply, but they only give me the strings
> matched by groups not their offsets.
>
> I could write something myself that first takes the above matches
> ("ab" and "aabb") and then searches again using only the group (b+).
> For this to work, I'd have to parse the regular expression and search
> several times (> 2, for nested groups) instead of just once.  But I'm
> sure there is a better way to do this.
>
> Thanks for any suggestion!
>
>   Titus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list