[BioC] testing GO categories with Fisher's exact test.

Nicholas Lewin-Koh nikko at hailmail.net
Tue Feb 24 09:33:18 MET 2004

Hi all,
I have a few questions about testing for over representation of terms in
a cluster.
let's consider a simple case, a set of chips from an experiment say
treated and untreted with 10,000
genes on the chip and 1000 differentially expressed. Of the 10000, 7000
can be annotated and 6000 have
a GO function assinged to them at a suitible level. Say for this example
there are 30 Go clasess that appear.
I then conduct Fisher's exact test 30 times on each GO category to detect
differential representation of terms in the expressed
set and correct for multiple testing.

My question is on the validity of this procedure. Just from experience
many genes will
have multiple functions assigned to them so the genes falling into GO
classes are not independent.
Also, there is the large set of un-annotated genes so we are in effect
ignoring the influence of 
all the unannotated genes on the outcome. Do people have any thoughts or
opinions on these approaches? It is
appearing all over the place in bioinformatics tools like FATIGO, EASE,
DAVID etc. I find that 
the formal testing approach makes me very uncomfortable, especially as
the biologists I work with tend to over interpret the results.
I am very interested to see the discussion on this topic.


More information about the Bioconductor mailing list