[BioC] testing GO categories with Fisher's exact test.

James MacDonald jmacdon at med.umich.edu
Wed Feb 25 14:50:56 MET 2004

I should add to this thread that there is existing software that will do
resampling to assess global significance of the p-values obtained from
this sort of analysis.




James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

>>> <cberry at tajo.ucsd.edu> 02/24/04 02:23PM >>>
On Tue, 24 Feb 2004, Nicholas Lewin-Koh wrote:

> Hi all,
> I have a few questions about testing for over representation of terms
> a cluster.
> let's consider a simple case, a set of chips from an experiment say
> treated and untreted with 10,000
> genes on the chip and 1000 differentially expressed. Of the 10000,
> can be annotated and 6000 have
> a GO function assinged to them at a suitible level. Say for this
> there are 30 Go clasess that appear.
> I then conduct Fisher's exact test 30 times on each GO category to
> differential representation of terms in the expressed
> set and correct for multiple testing.
> My question is on the validity of this procedure. 

It depends on what hypotheses you wish to test. The uniform
of the p value under the null hypothesis depends on ***all*** the
assumptions of the test obtaining.

The trouble is that you probably do not want to test whether the genes
your microarray are independent, since you already know that they are

> Just from experience
> many genes will
> have multiple functions assigned to them so the genes falling into
> classes are not independent.

> Also, there is the large set of un-annotated genes so we are in
> ignoring the influence of 
> all the unannotated genes on the outcome. Do people have any thoughts
> opinions on these approaches? It is
> appearing all over the place in bioinformatics tools like FATIGO,
> DAVID etc. 

SAM and similar permutation based approaches can be implemented for
setup to get p-values (or FDR's) that do not depend on independence of

The results given by permutation (of sample identities using the
hypergeometric p-value as the test statistic) are several orders of
magnitude more conservative than using the original 'p-value' even
correcting for multiple comparisons in several data sets I have seen.

I recall someone from the MAPPfinder group remarking at a conference
July that MAPPfinder 2.0 would implement permutation methods. But I
find this release yet using google.

Another approach to permutation testing of expression vs ontology is
outlined in:

Mootha VK et al. PGC-1 -responsive genes involved in
oxidative phosphorylation are coordinately downregulated in human
diabetes. Nature Genetics, 34(3):267 73, 2003.

I find that 
> the formal testing approach makes me very uncomfortable, especially
> the biologists I work with tend to over interpret the results.

Testing a better focussed hypothesis should increase your comfort


> I am very interested to see the discussion on this topic.
> Nicholas
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch 
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor 

Charles C. Berry                        (858) 534-2098 
                                         Dept of Family/Preventive
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 

More information about the Bioconductor mailing list