[BioC] Motif search -- access to JASPAR, MotIV package, more TF-PWM relationships?

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Wed Apr 25 15:44:48 CEST 2012


Paul,

Thanks for the positive feedback on FlyFactorSurvey! The motifs in this
database are generated using the bacterial one-hybrid method (B1H and
B1H-seq). All the public motifs can be downloaded freely. It would be useful
to have a Bioc data package, containing curated and current motifs from all
organisms if available, that interfaces with MotiV.

MEME works very well in finding motifs from B1H-seq data (Christensen et
al.,Nucleic Acid Research 2011, Vol39, No.12 e83), although only limited
motif discovery tools were compared in the paper. Currently, we are working
on whether motif discovery can be improved with B1H-seq data.

As I understand, MEME is for de nova motif discovery, TOMTOM and STAMP are
for testing whether the motif returned by a motif finder is significantly
similar to a known motif, clover is for searching known motifs in a given
set of sequences. We are thinking of adding clover to our website.

I am looking forward to your collated survey results.

Best regards,

Julie


On 4/24/12 11:02 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:

> Hi Julie,
> 
> FlyFactorSurvey looks great.   Would that we had such a resource (curated,
> current, and growing) for all organisms!
> 
> A few questions, if I may:
> 
>   1) What role with respect to FlyFactorSurvey do you picture us taking here
> at BioC?  How can we help?
> 
>   2) Your website (http://pgfe.umassmed.edu/TFDBS) recommends meme and TOMTOM
> for motif comparison.  Do you use them yourself?  If so, can you tell us about
> their strengths and weaknesses?  How do they compare to clover?
> (http://zlab.bu.edu/clover/)
> 
> In that same spirit -- trying to find out more about this topic -- here are
> some more questions:
> 
>   3) The JASPAR database seems to be mostly unchanged since 2009.
>      (http://jaspar.genereg.net/html/DOWNLOAD). Does anyone know their update
> policy? 
> 
>   4) Is TRANSFAC only for license holders?
> 
>   5) Are there any other organism-specific gems like FlyFactorSurvey to be
> discovered out on the web?
> 
> Thanks!
> 
>  - Paul  
> 
> On Apr 24, 2012, at 3:16 PM, Zhu, Lihua (Julie) wrote:
> 
>> Paul,
>> 
>> Thanks so much for the comprehensive summary of existing capability of Bioc
>> and other resources for motif discovery and matching!
>> 
>> Here is my response to your great initiative to collect use cases and open
>> data resources.
>> 
>> Here is an open data source for Drosophila which we developed:
>> http://pgfe.umassmed.edu/TFDBS/
>> http://nar.oxfordjournals.org/content/early/2010/11/19/nar.gkq858.full
>> 
>> As you pointed out, there are several excellent Bioconductor packages
>> available for the two common cases of motif problems, i.e., de nova motif
>> discovery and motif matching to known motifs. It would be useful to have
>> more motif databases available for motif comparison program such as MotIV.
>> In addition, we use clover to search for known motifs in a given set of
>> sequences.
>> 
>> Many thanks for sharing your insights!
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> On 4/24/12 3:02 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
>> 
>>> The recent flurry of interest in sequence motifs here on the bioc list
>>> suggests to us that maybe we at Bioconductor could strengthen our
>>> infrastructure for this kind of work.  If this work interests you -- either
>>> as
>>> a package creator, or as a package user -- please suggest ideas or use
>>> cases.
>>> What do you need?  I will collect and collate the responses.   We hope to
>>> identify places where Bioc can help out.
>>> 
>>> For background:  we already have a number of packages (rGADEM, MotIV, cosmo,
>>> BCRANK, motifRG) which address, with different strengths, what I believe to
>>> be
>>> the two aspects of the motif problem:
>>> 
>>>  1) Detecting enriched motifs in DNA sequence, or in ChIP-seq data  (rGADEM,
>>> cosmo, motifRG, BCRANK)
>>>  2) Predicting the sequence motifs which bind to these enriched motifs, and
>>> what binding molecules they belong to (MotIV)
>>> 
>>> In the past, a lot of sequence motif/binding work has addressed the search
>>> for
>>> transcription factor binding sites and their cognate transcription factors.
>>> miRNAs, phorphorylation and methylation all pose related problems.  Is there
>>> support which we can practically offer here as well?
>>> 
>>> In addition to Bioc packages, there are of course many worthwhile websites
>>> and
>>> external tools:  JASPAR, meme, STAMP (and TRANSFAC, for those with a
>>> license).
>>> Nooshin mentioned the arabidopsis-specific 'AthaMap'
>>> (http://www.athamap.de).
>>> Are there other open-source data repositories like this for other organisms?
>>> c.elegans, as Julie requested?
>>> 
>>> Questions, suggestions, use cases and data sources are all welcome.
>>> 
>>> Thanks!
>>> 
>>> - Paul
>>> 
>>> 
>>> 
>>> 
>>> On Apr 24, 2012, at 10:47 AM, Zhu, Lihua (Julie) wrote:
>>> 
>>>> Eloi,
>>>> 
>>>> I would like to use MotIV for a c.elegans dataset. What data source would
>>>> you recommend for matchMotif? Many thanks for your help!
>>>> 
>>>> Best regards,
>>>> 
>>>> Julie
>>>> 
>>>> 
>>>> On 4/24/12 1:28 PM, "Mercier Eloi" <emercier at chibi.ubc.ca> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am one of the developer of MotIV. I will be happy to help you if you
>>>>> have any question regarding the package.
>>>>> 
>>>>> First, I want to mention that in the Plos One paper, we used PICS,
>>>>> rGADEM and MotIV as a pipeline but MotIV can be use as a stand alone.
>>>>> Some of the advanced functions won't be available though.
>>>>> 
>>>>> Since the PWMs in MotIV correspond to human TF, you may have to use your
>>>>> own list of PWMs. What MotIV needs is a simple list of matrices
>>>>> (head(jaspar) to view the format).
>>>>> Jaspar's PWMs can be easily downloaded but it seems it only contains ~20
>>>>> motifs. On the other hand, AthaMap has more motifs but I did not manage
>>>>> to find an easy way to get them. Another place to look at is the AGRIS
>>>>> website (http://arabidopsis.med.ohio-state.edu/downloads.html).
>>>>> 
>>>>> If you're only interested by the identification of the motifs and do not
>>>>> want to do further analysis with R, I recommend you to look at
>>>>> http://www.benoslab.pitt.edu/stamp for the identification of your motifs.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Eloi Mercier
>>>>> 
>>>>> 
>>>>> On 12-04-24 07:36 AM, nooshin wrote:
>>>>>> Thanks a lot for your suggestion. I will for sure have a look and inform
>>>>>> you.
>>>>>> Bests,
>>>>>> Nooshin
>>>>>> 
>>>>>> 
>>>>>> On 04/24/2012 04:15 PM, Tim Triche, Jr. wrote:
>>>>>>> Ah, I see.  GSL is a useful library to have installed regardless.
>>>>>>> Hope things work out.  I found your exchanges with Paul to be useful
>>>>>>> reading, but obviously I was not reading closely enough, since Paul
>>>>>>> started off his code sample with biocLite('MotIV').  Oops :-o
>>>>>>> 
>>>>>>> Here is a paper that I found interesting, which does go into some
>>>>>>> detail towards a "bulk" approach, from Gottardo's group:
>>>>>>> 
>>>>>>> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.00164
>>>>>>> 32
>>>>> 
>>>>>>> Perhaps it will be useful to you as well, would be curious to hear if
>>>>>>> so.
>>>>>>> 
>>>>>>> --t
>>>>>>> 
>>>>>>> On Tue, Apr 24, 2012 at 7:00 AM, nooshin<n_omranian at yahoo.com
>>>>>>> <mailto:n_omranian at yahoo.com>>  wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>    Thanks, it's been already solved, it needs GSL package, which is a
>>>>>>>    bit problematic, but I solved it already.
>>>>>>> 
>>>>>>>    But it does include only 5 matrices (in the webpage) for
>>>>>>>    arabidopsis and in the package also!
>>>>>>>    I'm downloading manually from AthaMap!
>>>>>>> 
>>>>>>>    Thanks again and keep waiting for 'bulk' approach.
>>>>>>> 
>>>>>>>    Bests,
>>>>>>>    Nooshin
>>>>>>> 
>>>>>>> 
>>>>>>>    On 04/24/2012 03:16 PM, Tim Triche, Jr. wrote:
>>>>>>>>    source("http://bioconductor.org/biocLite.R")
>>>>>>>>    biocLite("MotIV")
>>>>>>>> 
>>>>>>>>    ought to do the trick for you
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>    On Tue, Apr 24, 2012 at 1:01 AM, nooshin<n_omranian at yahoo.com
>>>>>>>>    <mailto:n_omranian at yahoo.com>>  wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>        Hi Paul,
>>>>>>>> 
>>>>>>>>        Thanks a lot.
>>>>>>>>        I forgot to include bioc, since I only replied to you (no to
>>>>>>>>        all).
>>>>>>>> 
>>>>>>>>        I can"t install MotIV package to check. I checked in google but
>>>>>>>> I
>>>>>>>>        couldn't find any solution! Do you have any suggestion for
>>>>>>>>        installing
>>>>>>>>        this package?
>>>>>>>> 
>>>>>>>>        Bests,
>>>>>>>>        Nooshin
>>>>>>>> 
>>>>>>>>        On 04/23/2012 06:35 PM, Paul Shannon wrote:
>>>>>>>>> (redirecting this back to the Bioc list...)
>>>>>>>>> 
>>>>>>>>> Hi Nooshin,
>>>>>>>>> 
>>>>>>>>> The 'bulk' approach is not quite so ready as I predicted.
>>>>>>>>         I might have something by the end of the week.
>>>>>>>>> 
>>>>>>>>> As for mapping between PWMs and TFs, I have most often done
>>>>>>>>        this with 'tom-tom' from the meme website.
>>>>>>>>> 
>>>>>>>>> But I just discovered what looks like a good -- maybe
>>>>>>>>        better -- approach:  the Bioconductor MotIV package, which
>>>>>>>>        includes a 2010 version of jasper.
>>>>>>>>> Try this:
>>>>>>>>> 
>>>>>>>>>    source("http://bioconductor.org/biocLite.R")
>>>>>>>>> 
>>>>>>>>> biocLite ('MotIV')
>>>>>>>>> library (MotIV);
>>>>>>>>> browseVignettes ('MotIV')
>>>>>>>>> 
>>>>>>>>> The jaspar data in this package has 130 TF-PWM mappings,
>>>>>>>>        which appear to be human.  More must be known, and publicly
>>>>>>>>        available.  The JASPAR website has a 'JASPAR CORE Plantae'
>>>>>>>>         data set that
>>>>>>>>>    - is probably what you are interested in
>>>>>>>>>    - might be downloadable, and convertible to the form
>>>>>>>>        MotIV wants.
>>>>>>>>> 
>>>>>>>>> Perhaps other readers of the list have other suggestions.
>>>>>>>>> 
>>>>>>>>> If you have any questions on this, please include 'BioC' in
>>>>>>>>        your reply, so that we can all get better at this!
>>>>>>>>> 
>>>>>>>>>  - Paul
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Apr 23, 2012, at 6:53 AM, nooshin wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Paul,
>>>>>>>>>> 
>>>>>>>>>> Many thanks for your comprehensive information and code!
>>>>>>>>>> I have a question regarding to extract of PWMs. How and
>>>>>>>>        where I can download these matrices for all TFs that PWM is
>>>>>>>>        available for them? I need it only for Arabidopsis thaliana.
>>>>>>>>>> Is there any package in R which I can give the TF and
>>>>>>>>        receive the PWM for it? Or any online database which I can
>>>>>>>>        download from it? I have a big problem since Friday to find
>>>>>>>>        out these matrices for different TFs of A.th. That would be
>>>>>>>>        so great if you can help me to get these matrices.
>>>>>>>>>> 
>>>>>>>>>>> If you want to do this in bulk, Herve' has some lovely
>>>>>>>>        code to make that efficient.
>>>>>>>>>> Also can I have this? :)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks a lot in advance.
>>>>>>>>>> Best regards,
>>>>>>>>>> Nooshin
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>>        *TODAY*/(Beta) /*.*Powered by Yahoo!
>>>>>>>> 
>>>>>>>>        Armored catfish wreak havoc in U.S. South
>>>>>>>> 
>>>>>>>> <http://news.yahoo.com/blogs/sideshow/armored-catfish-wreaking-havoc-so
>>>>>>>> ut
>>>>>>>> h-
>>>>>>>> florida-lakes-182812663.html;_ylc=X3oDMTFia2oyNjZoBF9TAzk1NDAxMDAyNwRwa
>>>>>>>> 2c
>>>>>>>> Da
>>>>>>>> WQtMjIzODM5NARzeWlkA2RfZWNoMGQ4MGQ-#more-4190>
>>>>>>>> 
>>>>>>>>        Privacy Policy
>>>>>>>>        <http://info.yahoo.com/privacy/us/yahoo/webbeacons/details.html>
>>>>>>>> 
>>>>>>>>               [[alternative HTML version deleted]]
>>>>>>>> 
>>>>>>>> 
>>>>>>>>        _______________________________________________
>>>>>>>>        Bioconductor mailing list
>>>>>>>>        Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>>>>>>>>        https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>        Search the archives:
>>>>>>>>        
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>    --
>>>>>>>>    /A model is a lie that helps you see the truth./
>>>>>>>>    /
>>>>>>>>    /
>>>>>>>>    Howard Skipper
>>>>>>>>    <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> /A model is a lie that helps you see the truth./
>>>>>>> /
>>>>>>> /
>>>>>>> Howard Skipper
>>>>>>> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>>> 
>>>>>> 
>>>>>> [[alternative HTML version deleted]]
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>> 
>> 
> 



More information about the Bioconductor mailing list