[R] how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?

Gabor Grothendieck ggrothendieck at gmail.com
Sun Dec 20 11:33:09 CET 2009


Use a zero lookaround expression.  It will not consume its match.  See ?regexp

> gregexpr("a(?=a)", "aaa", perl = TRUE)
[[1]]
[1] 1 2
attr(,"match.length")
[1] 1 1


On Sun, Dec 20, 2009 at 1:43 AM, Jonathan <jonsleepy at gmail.com> wrote:
> Last one for you guys:
>
> The command:
>
> length(gregexpr('cus','hocus pocus')[[1]])
> [1] 2
>
> returns the number of times the substring 'cus' appears in 'hocus pocus'
> (which is two)
>
> It's returning the number of **disjoint** matches.  So:
>
> length(gregexpr('aa','aaa')[[1]])
>  [1] 1
>
> returns 1.
>
> **What I want to do:**
> I'm looking for a way to count all occurrences of the substring, including
> overlapping sets (so 'aa' would be found in 'aaa' two times, because the
> middle 'a' gets counted twice).
>
> Any ideas would be much appreciated!!
>
> Signing off and thanks for all the great assistance,
> Jonathan




More information about the R-help mailing list