[BioC] GO question

Mon Nov 19 16:51:23 CET 2012

I'll chime in with my two cents... I am unfortunately not familiar
with GOstats, but another option for calculating GO enrichments is
function GOenrichmentAnalysis in the WGCNA package (which I maintain
and which lives on CRAN). The advantage of GOenrichmentAnalysis is
that it can take multiple sets of labels (gene sets), creates the GO
gene lists once, then calculates enrichments of all given gene lists.
Indeed, creating the GO lists is the most time consuming part.

HTH,

Peter

On Mon, Nov 19, 2012 at 2:09 AM, Gustavo Fernández Bayón
<gbayon at gmail.com> wrote:
> Hi Cristobal.
>
> That makes sense to me now. Thank you for the explanation. For now, I would like to avoid using RDAVID, and stick with the BioC, but it's good to know about it.
>
> Thanks again.
>
> Regards,
> Gus
>
>
>
> ---------------------------
> Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> El viernes 16 de noviembre de 2012 a las 14:57, Cristobal Fresno Rodríguez escribió:
>
>> Hi Gustavo,
>>
>> I can see that you and I are dealing with the same issues for GO analysis. Indeed GOStats and DAVID apply very different algorithms. The first uses conditional hypergeometric test (default option) where one tail p-values are obtained walking the graph in a bottom-up fashion (from leaves to roots), whereas the second uses independent Ease scores (penalized two tails Fisher's exact test). Therefore, DAVID breaks the graph structure and paralellize all node evaluations at once. Moreover, the backyard is also hardware dedicated (tunned) for these kind of analysis, while GOstats do not. However, you can use RDAVID or some of the other available APIs from the web site http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html with limitations.
>>
>> Regards,
>>
>> Cristobal
>>
>>
>> 2012/11/16 Gustavo Fernández Bayón <gbayon at gmail.com (mailto:gbayon at gmail.com)>
>> > Hi Cristobal,
>> >
>> > thank you very much for the answer. I'll write it down in case my current workflow decides to work no more. For now, it does work just by loading the GOstats library inside the scope of the inner foreach. Maybe I have the multithreaded version of SQlite, I don't know.
>> >
>> > I was wondering why Gostats seems too slow when compared with the DAVID web tool. Is it just a matter of hardware (I do not know what is running at DAVID's backyard), or are there more efficient implementations? Is topGO a more efficient alternative? I currently have more than 100 groups of genes on which I want to do a GO analysis, that is why I am experimenting with parallel computing for it.
>> >
>> > Thank you again for your answer.
>> >
>> > Regards,
>> > Gus
>> >
>> >
>> > ---------------------------
>> > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
>> >
>> >
>> > El jueves 15 de noviembre de 2012 a las 16:55, Cristobal Fresno Rodríguez escribió:
>> >
>> > > HI Gus,
>> > >
>> > > The same problem here but using parallel. The problem lies in sqlite threadsafe mode (Single-thread, Multi-thread, Serialized). As far as I know, in windows the default binary comes with Serialized (thread safe) and in unix no, thus you have to compile it from the source. But, if you are using fork to parallelize, as in parallel or multicore library it stills breaks the database conection. I don't know if for foreach works. Maybe you should give it a try.
>> > >
>> > > At present, the workaround I am using is to manualy split HyperGTest into two functions: one to access annotation packages and other for the actual hypergeometric test. So, in the code first secuentially access all the annotation package/s to build the GO graphs and then run the tests in parallel. This workaround is pretty much what you have been doing so far.
>> > >
>> > > Regards,
>> > >
>> > > Cristobal
>> > >
>> > >
>> > > 2012/11/15 Gustavo Fernández Bayón <gbayon at gmail.com (mailto:gbayon at gmail.com) (mailto:gbayon at gmail.com)>
>> > > > Hi everybody.
>> > > >
>> > > > A simple question: is there any way I can perform a GO enrichment analysis without using the annotation packages?
>> > > >
>> > > > Problem is, I am trying to perform a series of GO analyses in parallel (with foreach), and I am experiencing problems with every call trying to access the same SqLite database. For now, I have solved it by putting "library(GOstats)" inside the inner foreach, but I was wondering if there is a better way.
>> > > >
>> > > > Regards,
>> > > > Gus
>> > > >
>> > > > ---------------------------
>> > > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
>> > > >
>> > > > _______________________________________________
>> > > > Bioconductor mailing list
>> > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) (mailto:Bioconductor at r-project.org)
>> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > >
>> >
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor