[BioC] Motif search -- access to JASPAR, MotIV package, more TF-PWM relationships?

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Wed Apr 25 00:16:22 CEST 2012


Paul,

Thanks so much for the comprehensive summary of existing capability of Bioc
and other resources for motif discovery and matching!

Here is my response to your great initiative to collect use cases and open
data resources.

Here is an open data source for Drosophila which we developed:
http://pgfe.umassmed.edu/TFDBS/
http://nar.oxfordjournals.org/content/early/2010/11/19/nar.gkq858.full

As you pointed out, there are several excellent Bioconductor packages
available for the two common cases of motif problems, i.e., de nova motif
discovery and motif matching to known motifs. It would be useful to have
more motif databases available for motif comparison program such as MotIV.
In addition, we use clover to search for known motifs in a given set of
sequences.

Many thanks for sharing your insights!

Best regards,

Julie


On 4/24/12 3:02 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:

> The recent flurry of interest in sequence motifs here on the bioc list
> suggests to us that maybe we at Bioconductor could strengthen our
> infrastructure for this kind of work.  If this work interests you -- either as
> a package creator, or as a package user -- please suggest ideas or use cases.
> What do you need?  I will collect and collate the responses.   We hope to
> identify places where Bioc can help out.
> 
> For background:  we already have a number of packages (rGADEM, MotIV, cosmo,
> BCRANK, motifRG) which address, with different strengths, what I believe to be
> the two aspects of the motif problem:
> 
>   1) Detecting enriched motifs in DNA sequence, or in ChIP-seq data  (rGADEM,
> cosmo, motifRG, BCRANK)
>   2) Predicting the sequence motifs which bind to these enriched motifs, and
> what binding molecules they belong to (MotIV)
> 
> In the past, a lot of sequence motif/binding work has addressed the search for
> transcription factor binding sites and their cognate transcription factors.
> miRNAs, phorphorylation and methylation all pose related problems.  Is there
> support which we can practically offer here as well?
> 
> In addition to Bioc packages, there are of course many worthwhile websites and
> external tools:  JASPAR, meme, STAMP (and TRANSFAC, for those with a license).
> Nooshin mentioned the arabidopsis-specific 'AthaMap' (http://www.athamap.de).
> Are there other open-source data repositories like this for other organisms?
> c.elegans, as Julie requested?
> 
> Questions, suggestions, use cases and data sources are all welcome.
> 
> Thanks!
> 
>  - Paul
> 
> 
> 
> 
> On Apr 24, 2012, at 10:47 AM, Zhu, Lihua (Julie) wrote:
> 
>> Eloi,
>> 
>> I would like to use MotIV for a c.elegans dataset. What data source would
>> you recommend for matchMotif? Many thanks for your help!
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> On 4/24/12 1:28 PM, "Mercier Eloi" <emercier at chibi.ubc.ca> wrote:
>> 
>>> Hello,
>>> 
>>> I am one of the developer of MotIV. I will be happy to help you if you
>>> have any question regarding the package.
>>> 
>>> First, I want to mention that in the Plos One paper, we used PICS,
>>> rGADEM and MotIV as a pipeline but MotIV can be use as a stand alone.
>>> Some of the advanced functions won't be available though.
>>> 
>>> Since the PWMs in MotIV correspond to human TF, you may have to use your
>>> own list of PWMs. What MotIV needs is a simple list of matrices
>>> (head(jaspar) to view the format).
>>> Jaspar's PWMs can be easily downloaded but it seems it only contains ~20
>>> motifs. On the other hand, AthaMap has more motifs but I did not manage
>>> to find an easy way to get them. Another place to look at is the AGRIS
>>> website (http://arabidopsis.med.ohio-state.edu/downloads.html).
>>> 
>>> If you're only interested by the identification of the motifs and do not
>>> want to do further analysis with R, I recommend you to look at
>>> http://www.benoslab.pitt.edu/stamp for the identification of your motifs.
>>> 
>>> Regards,
>>> 
>>> Eloi Mercier
>>> 
>>> 
>>> On 12-04-24 07:36 AM, nooshin wrote:
>>>> Thanks a lot for your suggestion. I will for sure have a look and inform
>>>> you.
>>>> Bests,
>>>> Nooshin
>>>> 
>>>> 
>>>> On 04/24/2012 04:15 PM, Tim Triche, Jr. wrote:
>>>>> Ah, I see.  GSL is a useful library to have installed regardless.
>>>>>  Hope things work out.  I found your exchanges with Paul to be useful
>>>>> reading, but obviously I was not reading closely enough, since Paul
>>>>> started off his code sample with biocLite('MotIV').  Oops :-o
>>>>> 
>>>>> Here is a paper that I found interesting, which does go into some
>>>>> detail towards a "bulk" approach, from Gottardo's group:
>>>>> 
>>>>> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016432
>>> 
>>>>> Perhaps it will be useful to you as well, would be curious to hear if so.
>>>>> 
>>>>> --t
>>>>> 
>>>>> On Tue, Apr 24, 2012 at 7:00 AM, nooshin<n_omranian at yahoo.com
>>>>> <mailto:n_omranian at yahoo.com>>  wrote:
>>>>> 
>>>>> 
>>>>>     Thanks, it's been already solved, it needs GSL package, which is a
>>>>>     bit problematic, but I solved it already.
>>>>> 
>>>>>     But it does include only 5 matrices (in the webpage) for
>>>>>     arabidopsis and in the package also!
>>>>>     I'm downloading manually from AthaMap!
>>>>> 
>>>>>     Thanks again and keep waiting for 'bulk' approach.
>>>>> 
>>>>>     Bests,
>>>>>     Nooshin
>>>>> 
>>>>> 
>>>>>     On 04/24/2012 03:16 PM, Tim Triche, Jr. wrote:
>>>>>>     source("http://bioconductor.org/biocLite.R")
>>>>>>     biocLite("MotIV")
>>>>>> 
>>>>>>     ought to do the trick for you
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>     On Tue, Apr 24, 2012 at 1:01 AM, nooshin<n_omranian at yahoo.com
>>>>>>     <mailto:n_omranian at yahoo.com>>  wrote:
>>>>>> 
>>>>>> 
>>>>>>         Hi Paul,
>>>>>> 
>>>>>>         Thanks a lot.
>>>>>>         I forgot to include bioc, since I only replied to you (no to
>>>>>>         all).
>>>>>> 
>>>>>>         I can"t install MotIV package to check. I checked in google but I
>>>>>>         couldn't find any solution! Do you have any suggestion for
>>>>>>         installing
>>>>>>         this package?
>>>>>> 
>>>>>>         Bests,
>>>>>>         Nooshin
>>>>>> 
>>>>>>         On 04/23/2012 06:35 PM, Paul Shannon wrote:
>>>>>>> (redirecting this back to the Bioc list...)
>>>>>>> 
>>>>>>> Hi Nooshin,
>>>>>>> 
>>>>>>> The 'bulk' approach is not quite so ready as I predicted.
>>>>>>          I might have something by the end of the week.
>>>>>>> 
>>>>>>> As for mapping between PWMs and TFs, I have most often done
>>>>>>         this with 'tom-tom' from the meme website.
>>>>>>> 
>>>>>>> But I just discovered what looks like a good -- maybe
>>>>>>         better -- approach:  the Bioconductor MotIV package, which
>>>>>>         includes a 2010 version of jasper.
>>>>>>> Try this:
>>>>>>> 
>>>>>>>     source("http://bioconductor.org/biocLite.R")
>>>>>>> 
>>>>>>> biocLite ('MotIV')
>>>>>>> library (MotIV);
>>>>>>> browseVignettes ('MotIV')
>>>>>>> 
>>>>>>> The jaspar data in this package has 130 TF-PWM mappings,
>>>>>>         which appear to be human.  More must be known, and publicly
>>>>>>         available.  The JASPAR website has a 'JASPAR CORE Plantae'
>>>>>>          data set that
>>>>>>>     - is probably what you are interested in
>>>>>>>     - might be downloadable, and convertible to the form
>>>>>>         MotIV wants.
>>>>>>> 
>>>>>>> Perhaps other readers of the list have other suggestions.
>>>>>>> 
>>>>>>> If you have any questions on this, please include 'BioC' in
>>>>>>         your reply, so that we can all get better at this!
>>>>>>> 
>>>>>>>   - Paul
>>>>>>> 
>>>>>>> 
>>>>>>> On Apr 23, 2012, at 6:53 AM, nooshin wrote:
>>>>>>> 
>>>>>>>> Hi Paul,
>>>>>>>> 
>>>>>>>> Many thanks for your comprehensive information and code!
>>>>>>>> I have a question regarding to extract of PWMs. How and
>>>>>>         where I can download these matrices for all TFs that PWM is
>>>>>>         available for them? I need it only for Arabidopsis thaliana.
>>>>>>>> Is there any package in R which I can give the TF and
>>>>>>         receive the PWM for it? Or any online database which I can
>>>>>>         download from it? I have a big problem since Friday to find
>>>>>>         out these matrices for different TFs of A.th. That would be
>>>>>>         so great if you can help me to get these matrices.
>>>>>>>> 
>>>>>>>>> If you want to do this in bulk, Herve' has some lovely
>>>>>>         code to make that efficient.
>>>>>>>> Also can I have this? :)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks a lot in advance.
>>>>>>>> Best regards,
>>>>>>>> Nooshin
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>>         *TODAY*/(Beta) /*.*Powered by Yahoo!
>>>>>> 
>>>>>>         Armored catfish wreak havoc in U.S. South
>>>>>> 
>>>>>> <http://news.yahoo.com/blogs/sideshow/armored-catfish-wreaking-havoc-sout
>>>>>> h-
>>>>>> florida-lakes-182812663.html;_ylc=X3oDMTFia2oyNjZoBF9TAzk1NDAxMDAyNwRwa2c
>>>>>> Da
>>>>>> WQtMjIzODM5NARzeWlkA2RfZWNoMGQ4MGQ-#more-4190>
>>>>>> 
>>>>>>         Privacy Policy
>>>>>>         <http://info.yahoo.com/privacy/us/yahoo/webbeacons/details.html>
>>>>>> 
>>>>>>                [[alternative HTML version deleted]]
>>>>>> 
>>>>>> 
>>>>>>         _______________________________________________
>>>>>>         Bioconductor mailing list
>>>>>>         Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>>>>>>         https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>         Search the archives:
>>>>>>         http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>     --
>>>>>>     /A model is a lie that helps you see the truth./
>>>>>>     /
>>>>>>     /
>>>>>>     Howard Skipper
>>>>>>     <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> /A model is a lie that helps you see the truth./
>>>>> /
>>>>> /
>>>>> Howard Skipper
>>>>> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>> 
>>>> 
>>>> [[alternative HTML version deleted]]
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list