[BioC] testing GO categories with Fisher's exact test.

cberry at tajo.ucsd.edu cberry at tajo.ucsd.edu
Tue Feb 24 20:23:44 MET 2004

On Tue, 24 Feb 2004, Nicholas Lewin-Koh wrote:

> Hi all,
> I have a few questions about testing for over representation of terms in
> a cluster.
> let's consider a simple case, a set of chips from an experiment say
> treated and untreted with 10,000
> genes on the chip and 1000 differentially expressed. Of the 10000, 7000
> can be annotated and 6000 have
> a GO function assinged to them at a suitible level. Say for this example
> there are 30 Go clasess that appear.
> I then conduct Fisher's exact test 30 times on each GO category to detect
> differential representation of terms in the expressed
> set and correct for multiple testing.
> My question is on the validity of this procedure. 

It depends on what hypotheses you wish to test. The uniform distribution
of the p value under the null hypothesis depends on ***all*** the
assumptions of the test obtaining.

The trouble is that you probably do not want to test whether the genes on
your microarray are independent, since you already know that they are not:

> Just from experience
> many genes will
> have multiple functions assigned to them so the genes falling into GO
> classes are not independent.

> Also, there is the large set of un-annotated genes so we are in effect
> ignoring the influence of 
> all the unannotated genes on the outcome. Do people have any thoughts or
> opinions on these approaches? It is
> appearing all over the place in bioinformatics tools like FATIGO, EASE,
> DAVID etc. 

SAM and similar permutation based approaches can be implemented for this
setup to get p-values (or FDR's) that do not depend on independence of

The results given by permutation (of sample identities using the
hypergeometric p-value as the test statistic) are several orders of
magnitude more conservative than using the original 'p-value' even without
correcting for multiple comparisons in several data sets I have seen.

I recall someone from the MAPPfinder group remarking at a conference last
July that MAPPfinder 2.0 would implement permutation methods. But I cannot
find this release yet using google.

Another approach to permutation testing of expression vs ontology is
outlined in:

Mootha VK et al. PGC-1 -responsive genes involved in
oxidative phosphorylation are coordinately downregulated in human
diabetes. Nature Genetics, 34(3):267 73, 2003.

I find that 
> the formal testing approach makes me very uncomfortable, especially as
> the biologists I work with tend to over interpret the results.

Testing a better focussed hypothesis should increase your comfort level.


> I am very interested to see the discussion on this topic.
> Nicholas
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

Charles C. Berry                        (858) 534-2098 
                                         Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego 92093-0717

More information about the Bioconductor mailing list