[BioC] Motif search -- access to JASPAR, MotIV package, , more TF-PWM relationships?

Vincent Schulz Vincent.Schulz at yale.edu
Fri Apr 27 16:22:56 CEST 2012


In regards to Questions, suggestions, use cases and data sources are all welcome for working with 
TF-PWM motifs, my 2c:

For other methods for de novo chip-seq motif finding besides MotIV and meme, there is a new paper 
that describes a fast method for chip-seq sets and has recent references:
http://www.ncbi.nlm.nih.gov/pubmed/22228832

I like the HOMER suite (perl, not R based), which is very fast, easy to use and gives reasonable 
results:
http://biowhat.ucsd.edu/homer/chipseq/index.html

If bioconductor matrix annotation packages are developed, it would be good IMO to have:
-obviously all of jaspar
-some sort of phylogenetic grouping, eg vertebrate, plant, in addition to species based, since 
species specific info is usually limited and not usually required.
-It would also be important to include the latest uniprobe matrices:
http://the_brain.bwh.harvard.edu/uniprobe/
Since it seems that Jaspar is slow to update.
-Maybe include the free version of transfac, even if this is terribly old?
-Unlikely, perhaps, but it would be great if someone would systematically go through all public 
chip-seq datasets and extract top motifs.  Or you could grab the Homer motifs from website above 
where they have already done this to a limited extent.

Another area that would be good is to have methods for identifying statistically significantly 
overrepresented known motifs in sets of DNA sequences, compared to some user chooseable control set 
(sequences from control set, all promoter sequences or same genome randomized in some way).  This 
has been implemented in user friendly non-R ways many times, especially for promoter analysis of 
differentially expressed gene sets, see eg:
-the homer package above
http://dire.dcode.org/
http://159.149.109.9/pscan/
http://www.dbi.tju.edu/dbi/tools/paint/
clover at
http://biowulf.bu.edu/MotifViz/
http://www.bioinfo.tsinghua.edu.cn/~zhengjsh/OTFBS/
http://www.telis.ucla.edu/index.php?cmd=transfac
http://grenada.lumc.nl/HumaneGenetica/CORE_TF/

Finally, the ability to easily search for a given motif in a DNA sequence, but to attach a score to 
the match like the possum program listed at clover/motifviz above or like the transfac match 
software.  This would use some kind test against control sequencesand  currently could be done using 
existing bioconductor tools, but it is a common enough use that a package would be good.  The idea 
is not to just give a score about how close the PWM is to the sequence, but also how likely is it to 
happen by chance, since many of the PWM's are very sloppy.

Vince

.........
On 4/24/12 11:02 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
 > Hi Julie,
 >
 > FlyFactorSurvey looks great.   Would that we had such a resource (curated,
 > current, and growing) for all organisms!
 >
 > A few questions, if I may:
 >
 >   1) What role with respect to FlyFactorSurvey do you picture us taking here
 > at BioC?  How can we help?
 >
 >   2) Your website (http://pgfe.umassmed.edu/TFDBS) recommends meme and TOMTOM
 > for motif comparison.  Do you use them yourself?  If so, can you tell us about
 > their strengths and weaknesses?  How do they compare to clover?
 > (http://zlab.bu.edu/clover/)
 >
 > In that same spirit -- trying to find out more about this topic -- here are
 > some more questions:
 >
 >   3) The JASPAR database seems to be mostly unchanged since 2009.
 >      (http://jaspar.genereg.net/html/DOWNLOAD). Does anyone know their update
 > policy?
 >
 >   4) Is TRANSFAC only for license holders?
 >
 >   5) Are there any other organism-specific gems like FlyFactorSurvey to be
 > discovered out on the web?
 >
 > Thanks!
 >
 >  - Paul
.............



More information about the Bioconductor mailing list