[BioC] topGO enrichment using ensembl gene list

James W. MacDonald jmacdon at med.umich.edu
Mon Mar 31 19:12:34 CEST 2008


Hi Julien,

Julien Roux wrote:
> Hello list,
> 
> I am using the package "topGO" to analyse GO enrichment of gene sets:
> 
> My genes are ensembl IDs and are not taken from a microarray, so I had 
> to feed "topGOdata" with a gene2GO list.
> (see 
> http://thread.gmane.org/gmane.science.biology.informatics.conductor/14627)
> I construct that list by mapping all ensembl IDs to GO IDs using the 
> package "biomaRt".
> Then I proceed with my analysis:
> 
>  > GOdata <- new("topGOdata", ontology = "MF", allGenes = selectedList, 
> description = "Ensembl GO enrichment", annot = annFUN.gene2GO, gene2GO = 
> gene2GO)
> 
> Do you confirm this approach is correct?

It should be correct. You simply need a named character vector where the 
names are the Entrez Gene IDs, and the vector contains GO IDs.

> 
> I also had several question concerning topGO:
> - Are the p-value in topGO corrected for multiple testing (FDR...)? My 
> guess is that they are not due to a problem of independence...

I don't think they are corrected. I'm not even sure you could (or 
should). As with a lot of microarray analyses, p-values should not be 
taken at face value. Rather they should be used more as ranking tools.

> - Are there some differences between Fisher exact test (topGO) and 
> Hypergeometric test (GOstats). If yes, why did the two packages make 
> different choices?

Both packages are using the same test. The Fisher exact test is used to 
assess association between variables in a 2x2 contingency table. Under 
the null hypothesis of independence the counts in a given table follow a 
hypergeometric distribution, so the p-values for a 2x2 table are 
computed using this distribution. See e.g., ?fisher.test


> - It is not clear to me what the Kolmogorov-Smirnov is testing? 
> Especially in my case where I don't provide scores associated to my genes...
> - Is there a way to test separately over/under representation of GO 
> categories?

In GOstats there is. I don't know about topGO.

Best,

Jim


>  
> Thanks a lot in advance for your help or tips
> Julien
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list