[BioC] topGO enrichment using ensembl gene list

James W. MacDonald jmacdon at med.umich.edu
Mon Mar 31 19:12:34 CEST 2008

Hi Julien,

Julien Roux wrote:
> Hello list,
> I am using the package "topGO" to analyse GO enrichment of gene sets:
> My genes are ensembl IDs and are not taken from a microarray, so I had 
> to feed "topGOdata" with a gene2GO list.
> (see 
> http://thread.gmane.org/gmane.science.biology.informatics.conductor/14627)
> I construct that list by mapping all ensembl IDs to GO IDs using the 
> package "biomaRt".
> Then I proceed with my analysis:
>  > GOdata <- new("topGOdata", ontology = "MF", allGenes = selectedList, 
> description = "Ensembl GO enrichment", annot = annFUN.gene2GO, gene2GO = 
> gene2GO)
> Do you confirm this approach is correct?

It should be correct. You simply need a named character vector where the 
names are the Entrez Gene IDs, and the vector contains GO IDs.

> I also had several question concerning topGO:
> - Are the p-value in topGO corrected for multiple testing (FDR...)? My 
> guess is that they are not due to a problem of independence...

I don't think they are corrected. I'm not even sure you could (or 
should). As with a lot of microarray analyses, p-values should not be 
taken at face value. Rather they should be used more as ranking tools.

> - Are there some differences between Fisher exact test (topGO) and 
> Hypergeometric test (GOstats). If yes, why did the two packages make 
> different choices?

Both packages are using the same test. The Fisher exact test is used to 
assess association between variables in a 2x2 contingency table. Under 
the null hypothesis of independence the counts in a given table follow a 
hypergeometric distribution, so the p-values for a 2x2 table are 
computed using this distribution. See e.g., ?fisher.test

> - It is not clear to me what the Kolmogorov-Smirnov is testing? 
> Especially in my case where I don't provide scores associated to my genes...
> - Is there a way to test separately over/under representation of GO 
> categories?

In GOstats there is. I don't know about topGO.



> Thanks a lot in advance for your help or tips
> Julien

James W. MacDonald, M.S.
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

More information about the Bioconductor mailing list