[BioC] Motif search -- access to JASPAR, MotIV package, more TF-PWM relationships?

Paul Shannon pshannon at fhcrc.org
Wed Apr 25 05:02:07 CEST 2012


Hi Julie,

FlyFactorSurvey looks great.   Would that we had such a resource (curated, current, and growing) for all organisms!

A few questions, if I may:

  1) What role with respect to FlyFactorSurvey do you picture us taking here at BioC?  How can we help?

  2) Your website (http://pgfe.umassmed.edu/TFDBS) recommends meme and TOMTOM for motif comparison.  Do you use them yourself?  If so, can you tell us about their strengths and weaknesses?  How do they compare to clover?  (http://zlab.bu.edu/clover/)

In that same spirit -- trying to find out more about this topic -- here are some more questions:

  3) The JASPAR database seems to be mostly unchanged since 2009.  
     (http://jaspar.genereg.net/html/DOWNLOAD). Does anyone know their update policy? 

  4) Is TRANSFAC only for license holders?  

  5) Are there any other organism-specific gems like FlyFactorSurvey to be discovered out on the web?

Thanks!

 - Paul  

On Apr 24, 2012, at 3:16 PM, Zhu, Lihua (Julie) wrote:

> Paul,
> 
> Thanks so much for the comprehensive summary of existing capability of Bioc
> and other resources for motif discovery and matching!
> 
> Here is my response to your great initiative to collect use cases and open
> data resources.
> 
> Here is an open data source for Drosophila which we developed:
> http://pgfe.umassmed.edu/TFDBS/
> http://nar.oxfordjournals.org/content/early/2010/11/19/nar.gkq858.full
> 
> As you pointed out, there are several excellent Bioconductor packages
> available for the two common cases of motif problems, i.e., de nova motif
> discovery and motif matching to known motifs. It would be useful to have
> more motif databases available for motif comparison program such as MotIV.
> In addition, we use clover to search for known motifs in a given set of
> sequences.
> 
> Many thanks for sharing your insights!
> 
> Best regards,
> 
> Julie
> 
> 
> On 4/24/12 3:02 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
> 
>> The recent flurry of interest in sequence motifs here on the bioc list
>> suggests to us that maybe we at Bioconductor could strengthen our
>> infrastructure for this kind of work.  If this work interests you -- either as
>> a package creator, or as a package user -- please suggest ideas or use cases.
>> What do you need?  I will collect and collate the responses.   We hope to
>> identify places where Bioc can help out.
>> 
>> For background:  we already have a number of packages (rGADEM, MotIV, cosmo,
>> BCRANK, motifRG) which address, with different strengths, what I believe to be
>> the two aspects of the motif problem:
>> 
>>  1) Detecting enriched motifs in DNA sequence, or in ChIP-seq data  (rGADEM,
>> cosmo, motifRG, BCRANK)
>>  2) Predicting the sequence motifs which bind to these enriched motifs, and
>> what binding molecules they belong to (MotIV)
>> 
>> In the past, a lot of sequence motif/binding work has addressed the search for
>> transcription factor binding sites and their cognate transcription factors.
>> miRNAs, phorphorylation and methylation all pose related problems.  Is there
>> support which we can practically offer here as well?
>> 
>> In addition to Bioc packages, there are of course many worthwhile websites and
>> external tools:  JASPAR, meme, STAMP (and TRANSFAC, for those with a license).
>> Nooshin mentioned the arabidopsis-specific 'AthaMap' (http://www.athamap.de).
>> Are there other open-source data repositories like this for other organisms?
>> c.elegans, as Julie requested?
>> 
>> Questions, suggestions, use cases and data sources are all welcome.
>> 
>> Thanks!
>> 
>> - Paul
>> 
>> 
>> 
>> 
>> On Apr 24, 2012, at 10:47 AM, Zhu, Lihua (Julie) wrote:
>> 
>>> Eloi,
>>> 
>>> I would like to use MotIV for a c.elegans dataset. What data source would
>>> you recommend for matchMotif? Many thanks for your help!
>>> 
>>> Best regards,
>>> 
>>> Julie
>>> 
>>> 
>>> On 4/24/12 1:28 PM, "Mercier Eloi" <emercier at chibi.ubc.ca> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I am one of the developer of MotIV. I will be happy to help you if you
>>>> have any question regarding the package.
>>>> 
>>>> First, I want to mention that in the Plos One paper, we used PICS,
>>>> rGADEM and MotIV as a pipeline but MotIV can be use as a stand alone.
>>>> Some of the advanced functions won't be available though.
>>>> 
>>>> Since the PWMs in MotIV correspond to human TF, you may have to use your
>>>> own list of PWMs. What MotIV needs is a simple list of matrices
>>>> (head(jaspar) to view the format).
>>>> Jaspar's PWMs can be easily downloaded but it seems it only contains ~20
>>>> motifs. On the other hand, AthaMap has more motifs but I did not manage
>>>> to find an easy way to get them. Another place to look at is the AGRIS
>>>> website (http://arabidopsis.med.ohio-state.edu/downloads.html).
>>>> 
>>>> If you're only interested by the identification of the motifs and do not
>>>> want to do further analysis with R, I recommend you to look at
>>>> http://www.benoslab.pitt.edu/stamp for the identification of your motifs.
>>>> 
>>>> Regards,
>>>> 
>>>> Eloi Mercier
>>>> 
>>>> 
>>>> On 12-04-24 07:36 AM, nooshin wrote:
>>>>> Thanks a lot for your suggestion. I will for sure have a look and inform
>>>>> you.
>>>>> Bests,
>>>>> Nooshin
>>>>> 
>>>>> 
>>>>> On 04/24/2012 04:15 PM, Tim Triche, Jr. wrote:
>>>>>> Ah, I see.  GSL is a useful library to have installed regardless.
>>>>>> Hope things work out.  I found your exchanges with Paul to be useful
>>>>>> reading, but obviously I was not reading closely enough, since Paul
>>>>>> started off his code sample with biocLite('MotIV').  Oops :-o
>>>>>> 
>>>>>> Here is a paper that I found interesting, which does go into some
>>>>>> detail towards a "bulk" approach, from Gottardo's group:
>>>>>> 
>>>>>> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016432
>>>> 
>>>>>> Perhaps it will be useful to you as well, would be curious to hear if so.
>>>>>> 
>>>>>> --t
>>>>>> 
>>>>>> On Tue, Apr 24, 2012 at 7:00 AM, nooshin<n_omranian at yahoo.com
>>>>>> <mailto:n_omranian at yahoo.com>>  wrote:
>>>>>> 
>>>>>> 
>>>>>>    Thanks, it's been already solved, it needs GSL package, which is a
>>>>>>    bit problematic, but I solved it already.
>>>>>> 
>>>>>>    But it does include only 5 matrices (in the webpage) for
>>>>>>    arabidopsis and in the package also!
>>>>>>    I'm downloading manually from AthaMap!
>>>>>> 
>>>>>>    Thanks again and keep waiting for 'bulk' approach.
>>>>>> 
>>>>>>    Bests,
>>>>>>    Nooshin
>>>>>> 
>>>>>> 
>>>>>>    On 04/24/2012 03:16 PM, Tim Triche, Jr. wrote:
>>>>>>>    source("http://bioconductor.org/biocLite.R")
>>>>>>>    biocLite("MotIV")
>>>>>>> 
>>>>>>>    ought to do the trick for you
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>    On Tue, Apr 24, 2012 at 1:01 AM, nooshin<n_omranian at yahoo.com
>>>>>>>    <mailto:n_omranian at yahoo.com>>  wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>        Hi Paul,
>>>>>>> 
>>>>>>>        Thanks a lot.
>>>>>>>        I forgot to include bioc, since I only replied to you (no to
>>>>>>>        all).
>>>>>>> 
>>>>>>>        I can"t install MotIV package to check. I checked in google but I
>>>>>>>        couldn't find any solution! Do you have any suggestion for
>>>>>>>        installing
>>>>>>>        this package?
>>>>>>> 
>>>>>>>        Bests,
>>>>>>>        Nooshin
>>>>>>> 
>>>>>>>        On 04/23/2012 06:35 PM, Paul Shannon wrote:
>>>>>>>> (redirecting this back to the Bioc list...)
>>>>>>>> 
>>>>>>>> Hi Nooshin,
>>>>>>>> 
>>>>>>>> The 'bulk' approach is not quite so ready as I predicted.
>>>>>>>         I might have something by the end of the week.
>>>>>>>> 
>>>>>>>> As for mapping between PWMs and TFs, I have most often done
>>>>>>>        this with 'tom-tom' from the meme website.
>>>>>>>> 
>>>>>>>> But I just discovered what looks like a good -- maybe
>>>>>>>        better -- approach:  the Bioconductor MotIV package, which
>>>>>>>        includes a 2010 version of jasper.
>>>>>>>> Try this:
>>>>>>>> 
>>>>>>>>    source("http://bioconductor.org/biocLite.R")
>>>>>>>> 
>>>>>>>> biocLite ('MotIV')
>>>>>>>> library (MotIV);
>>>>>>>> browseVignettes ('MotIV')
>>>>>>>> 
>>>>>>>> The jaspar data in this package has 130 TF-PWM mappings,
>>>>>>>        which appear to be human.  More must be known, and publicly
>>>>>>>        available.  The JASPAR website has a 'JASPAR CORE Plantae'
>>>>>>>         data set that
>>>>>>>>    - is probably what you are interested in
>>>>>>>>    - might be downloadable, and convertible to the form
>>>>>>>        MotIV wants.
>>>>>>>> 
>>>>>>>> Perhaps other readers of the list have other suggestions.
>>>>>>>> 
>>>>>>>> If you have any questions on this, please include 'BioC' in
>>>>>>>        your reply, so that we can all get better at this!
>>>>>>>> 
>>>>>>>>  - Paul
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Apr 23, 2012, at 6:53 AM, nooshin wrote:
>>>>>>>> 
>>>>>>>>> Hi Paul,
>>>>>>>>> 
>>>>>>>>> Many thanks for your comprehensive information and code!
>>>>>>>>> I have a question regarding to extract of PWMs. How and
>>>>>>>        where I can download these matrices for all TFs that PWM is
>>>>>>>        available for them? I need it only for Arabidopsis thaliana.
>>>>>>>>> Is there any package in R which I can give the TF and
>>>>>>>        receive the PWM for it? Or any online database which I can
>>>>>>>        download from it? I have a big problem since Friday to find
>>>>>>>        out these matrices for different TFs of A.th. That would be
>>>>>>>        so great if you can help me to get these matrices.
>>>>>>>>> 
>>>>>>>>>> If you want to do this in bulk, Herve' has some lovely
>>>>>>>        code to make that efficient.
>>>>>>>>> Also can I have this? :)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks a lot in advance.
>>>>>>>>> Best regards,
>>>>>>>>> Nooshin
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>        *TODAY*/(Beta) /*.*Powered by Yahoo!
>>>>>>> 
>>>>>>>        Armored catfish wreak havoc in U.S. South
>>>>>>> 
>>>>>>> <http://news.yahoo.com/blogs/sideshow/armored-catfish-wreaking-havoc-sout
>>>>>>> h-
>>>>>>> florida-lakes-182812663.html;_ylc=X3oDMTFia2oyNjZoBF9TAzk1NDAxMDAyNwRwa2c
>>>>>>> Da
>>>>>>> WQtMjIzODM5NARzeWlkA2RfZWNoMGQ4MGQ-#more-4190>
>>>>>>> 
>>>>>>>        Privacy Policy
>>>>>>>        <http://info.yahoo.com/privacy/us/yahoo/webbeacons/details.html>
>>>>>>> 
>>>>>>>               [[alternative HTML version deleted]]
>>>>>>> 
>>>>>>> 
>>>>>>>        _______________________________________________
>>>>>>>        Bioconductor mailing list
>>>>>>>        Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>>>>>>>        https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>        Search the archives:
>>>>>>>        http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>    --
>>>>>>>    /A model is a lie that helps you see the truth./
>>>>>>>    /
>>>>>>>    /
>>>>>>>    Howard Skipper
>>>>>>>    <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> /A model is a lie that helps you see the truth./
>>>>>> /
>>>>>> /
>>>>>> Howard Skipper
>>>>>> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>> 
>>>>> 
>>>>> [[alternative HTML version deleted]]
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> 
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 
> 



More information about the Bioconductor mailing list