[R] searching several subsequences in a single string sequence

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Tue Sep 27 19:06:21 CEST 2011


On Tue, Sep 27, 2011 at 5:51 PM, Marcelo Araya <marceloa27 at gmail.com> wrote:
> Hi all
>
>
>
> I am analyzing bird song element sequences. I would like to know how can I
> get how many times a given subsequence is found in single string sequence.
>
>
>
>
>
> For example:
>
>
>
> If I have this single sequence:
>
>
>
> ABCABAABABABCAB
>
>
>
> I am looking for the subsequence "ABC". Want I need to get here is that the
> subsequence is found twice.
>
>
>
> Any idea how can I do this?
>

 gregexpr will return the position and length of multiple matches. And
you can feed it a vector. So:


 > songs=c("ABCABAABABABCAB","ABACAB","ABABCABCBC")
 > gregexpr(m,songs)
[[1]]
[1]  1 11
attr(,"match.length")
[1] 3 3

[[2]]
[1] -1
attr(,"match.length")
[1] -1

[[3]]
[1] 3 6
attr(,"match.length")
[1] 3 3

- in the first item, it was found at posn 1 and 11
 - in the second it wasnt found at all
 - in the third, it was found at posn 3 and 6

 so just do some apply-ing to the returned list and get the length of
each element. Job done!

Barry

PS bonus points for spotting the hidden prog-rock song title.



More information about the R-help mailing list