[BioC] Simple pathway enrichment analysis for gene lists

Marc Carlson mcarlson at fhcrc.org
Mon Sep 9 19:00:26 CEST 2013


On 09/09/2013 08:25 AM, Paul Shannon wrote:
> Hi Enrico,
>
> reactome.db is best described, I believe, as simply a bioc rendering of Reactome's sql database -- Marc, please correct me if I am wrong.

Basically yes, but there is still information in there for connecting 
gene IDs to reactome pathway IDs.  And we put some effort into making 
this easier to use than it would be by default.  So for example if you 
have an entrez gene ID "10898" you can look it up like this:

select(reactome.db, 10898, "PATHID", "ENTREZID")

And yes it is definitely incomplete in terms of pathway representation.  
And it is probably important to bear in mind that every large pathway 
database should be expected to be terribly incomplete.  In fact if you 
compare them you will see that different databases overlap a lot less 
than you might have hoped.  And even if they overlapped perfectly, they 
would still be woefully incomplete simply because our knowledge of 
pathways is far from complete.  So you should be careful to never assume 
(for example) that what is in the database is complete, and to be honest 
even assuming that what is there is somehow "representative" may be a 
stretch in this case.


   Marc


> There is thus a data representation obstacle when using reactome.db:  molecular relations are described, with pathway/gene mappings not so easy to get at.
> In addition, and despite Reactome's many strengths, its coverage is incomplete.   The canonical wnt pathway, for instance, is (at my last check) not included.
>
> If you have a list of geneIDs, exploratory analysis can usefully start out with both GO enrichment, KEGG enrichment, and GSEA.   Though the information in KEGG.db has not been updated in a couple of years, the information there is still very useful for exploratory data analysis.   Any enrichments you discover using these assorted gene/ctaegory associations may lead you to a close study of particular functions or pathways, and it is this point that you may wish to get the latest and most specific information via KEGGREST and Reactome (and, with our next release) the new PSICQUIC package (see http://code.google.com/p/psicquic/).
>
> I hope this helps.   Let us know if it falls short, or if new questions arise.
>
>   - Paul
>
>
> On Sep 9, 2013, at 7:53 AM, Enrico Ferrero wrote:
>
>> Dear list,
>>
>> Can anybody suggest how to perform a simple pathway enrichment
>> analysis starting from a list of gene IDs?
>>
>> I know about the gage and ROntoTools packages that use KEGGREST to
>> retrieve an up to date version of the KEGG database, but, as far as I
>> understand, they require a microarray experiment as input (or at least
>> fold changes and pvalues).
>>
>> Since this time around I'm not starting from a microarray experiment
>> but I just have a gene list, I'm looking for a way to perform pathway
>> enrichment analysis using a simple numerical method such as Fisher's /
>> hypergeometric test.
>>
>> I know the Category package still provides a KEGGHyperG class (which
>> would be perfect!), but the results are based on the outdated version
>> of KEGG (via KEGG.db, I guess).
>>
>> Are there any good alternatives available out there? Would it be
>> possible to use reactome.db in conjunction with the Category/GOstats
>> functions for example?
>>
>> Thank you!
>> Best,
>>
>> -- 
>> Enrico Ferrero
>> Department of Genetics
>> Cambridge Systems Biology Centre
>> University of Cambridge
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list