[BioC] How to find frequent sequences.

Thu Jul 12 23:17:13 CEST 2012

I have independent event sequences for example as follows :

Independent event sequence   1 : A , B , C , D
Independent event sequence   2 : A, C , B
Independent event sequence   3 :D, A, B, X,Y, Z
Independent event sequence   4 :C,A,A,B
Independent event sequence   5 :B,A,D

I want to able to find that most common sequence patters as

{A, B }  = > 3 
from lines 1,3,5.

Pls note that A,C,B must not be considered because C comes in between
and line 5 also must not be considered because order of A,B is reversed.

In simple words I am looking for "most frequent independent event sequent" for any length.

I tried SPADE but it does not work for me because there event sequences are not independent.  

Pls let me know which R algo/package I can use ?

Rgds,
Vineet

 -- output of sessionInfo(): 

none.

--
Sent via the guest posting facility at bioconductor.org.