[R] Regular expressions: offsets of groups

Titus von der Malsburg malsburg at gmail.com
Mon Sep 27 19:08:16 CEST 2010


Thank you Jim, but just as the solution that I discussed, your
proposal involves deconstructing the pattern and searching several
times.  I'm looking for a general and efficient solution.  Internally,
the regexpr engine has all necessary information after one pass
through the string.  What I need is an interface that exposes this
information.

  Titus

On Mon, Sep 27, 2010 at 6:43 PM, jim holtman <jholtman at gmail.com> wrote:
> try this:
>
>> x <-  gregexpr("a+(b+)", "abcdaabbcaaacaaab")
>> justA <-  gregexpr("a+", "abcdaabbcaaacaaab")
>> # find matches in 'x' for 'justA'
>> indx <- which(justA[[1]] %in% x[[1]])
>> # now determine where 'b' starts
>> justA[[1]][indx] + attr(justA[[1]], 'match.length')[indx]
> [1]  2  7 17
>>
>
>
> On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg
> <malsburg at gmail.com> wrote:
>> Dear list!
>>
>>> gregexpr("a+(b+)", "abcdaabbc")
>> [[1]]
>> [1] 1 5
>> attr(,"match.length")
>> [1] 2 4
>>
>> What I want is the offsets of the matches for the group (b+), i.e. 2
>> and 7, not the offsets of the complete matches.  Is there a way in R
>> to get that?
>>
>> I know about gsubgn and strapply, but they only give me the strings
>> matched by groups not their offsets.
>>
>> I could write something myself that first takes the above matches
>> ("ab" and "aabb") and then searches again using only the group (b+).
>> For this to work, I'd have to parse the regular expression and search
>> several times (> 2, for nested groups) instead of just once.  But I'm
>> sure there is a better way to do this.
>>
>> Thanks for any suggestion!
>>
>>   Titus
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



More information about the R-help mailing list