[BioC] motif searching with variable length gaps

Oliver Bembom oliver.bembom at gmail.com
Thu Nov 1 22:24:52 CET 2007


Hi Heather,

If your goal at this point is to insert motifs like the ones above
into random sequences to see if cosmo can find them, you should be
able to use any character you like for the gaps. Since a gap in a
motif just says that the specific nucleotide is not important, you
could in particular just fill in random A, C, G, T letters. If you
don't want to do that, something like "X" or "N" should also work. It
sounds like you already got it to work with the "N".

I hope this helps.

Oliver

On Nov 1, 2007 11:38 AM, Houseman, Heather <Heather.Houseman at vai.org> wrote:
> Hello Oliver,
>
> What symbol should I use for gaps?  I tried a "-" for each gap, but cosmo returned an error.  I then tried a "0" for each gap and cosmo didn't return an error, but the motifs it returned didn't contain any 0's, so I assumed it didn't "like" them.  I tried an "N" for each gap and it didn't return an error and the motifs contained N's.  I realize that an "N" is not the same as a gap.  I was just seeing if it would work.
>
> What strategy do you suggest?  Should I use some other function to align the sequences, which would most likely put gaps in them, and then use those sequences with gaps with cosmo to return motifs?  But then, I'm back to the question of what symbol to use for the gaps.
>
> Thanks for your help!
>
> Heather
>
> -----Original Message-----
> From: Oliver Bembom [mailto:oliver.bembom at gmail.com]
> Sent: Thursday, November 01, 2007 2:24 PM
> To: Houseman, Heather
>
> Cc: Herve Pages; bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] motif searching with variable length gaps
>
> Hi Heather,
>
> cosmo could find motifs like these as long as the total length of the
> motif (2 outer parts + gap) is the same in each motif. If that's true,
> your code should work.
>
> Oliver
>
> On Nov 1, 2007 11:10 AM, Houseman, Heather <Heather.Houseman at vai.org> wrote:
> > Herve,
> >
> > My ultimate goal is to find motifs in different sequences that are similar to the ones below.
> >
> > TACGTGCTGTCTCACACAG
> > GACGTGACTCGGACCACAT
> > TACGTGGGT--TTCCACAG
> > TACGTGAC----CACACAC
> > TACGTGC-------CACAG
> > CACGTGC-------CACAC
> > GGCGTGAGC-----CACCG
> > GGCGTGGGAGCG--CACAG
> > TACGTG------CACACAG
> >
> > To start off, I'm inserting the motifs above into random sequences to see if I can get cosmo to return those motifs.  Once I get that procedure to work, I'd like to use it to apply it to "real" sequences and hopefully return motifs that look similar to the ones above.
> >
> > Here's the cosmo code I'm using:
> >
> > res = cosmo(seqs = seqs, minW = 12, maxW = 20, models = "OOPS")
> >
> > Is this more along the lines of multiple sequence alignment and not something that I can use cosmo for?
> >
> > Thanks!
> >
> > Heather
> >
> > -----Original Message-----
> > From: Herve Pages [mailto:hpages at fhcrc.org]
> > Sent: Thursday, November 01, 2007 1:33 PM
> > To: Houseman, Heather
> > Cc: bioconductor at stat.math.ethz.ch
> > Subject: Re: [BioC] motif searching with variable length gaps
> >
> > Hi Heather,
> >
> > Can you please give some examples of your motifs?
> >
> > Also showing us the code that you use with cosmo can be useful.
> >
> > Even if the matchPattern() function in Biostrings doesn't let you control the number
> > of gaps, there might be workarounds, it all depends what your motifs really look
> > like. And we need use cases anyway so we know where to put our efforts. Thanks!
> >
> > H.
> >
> >
> > Houseman, Heather wrote:
> > > Dear Bioconductor mailing list:
> > >
> > > I've been using cosmo to look for motifs.  I'd like to search for motifs that have a variable length of gaps in the middle. If I specify a range of motif widths with the cosmo function, it uses the width with the lowest BIC value and searches for motifs of only that width.  My dilemma is that the motifs I'm looking for are of variable width.
> > >
> > > Thanks in advance for any help!
> > >
> > > Heather
> > >
> > > This email message, including any attachments, is for ...{{dropped:16}}
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> This email message, including any attachments, is for ...{{dropped:3}}



More information about the Bioconductor mailing list