[BioC] testing GO categories with Fisher's exact test.

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Wed Feb 25 10:07:18 MET 2004

Forgive my naivety, but could one not use a chi-squared test here?
We have an observed amount of genes in each category, and could calculate an expected from
the size of the cluster and the distribution of all genes throughout GO categories...


-----Original Message-----
From: Nicholas Lewin-Koh [mailto:nikko at hailmail.net]
Sent: 24 February 2004 08:33
To: bioconductor at stat.math.ethz.ch
Cc: rdiaz at cnio.es
Subject: [BioC] testing GO categories with Fisher's exact test.

Hi all,
I have a few questions about testing for over representation of terms in
a cluster.
let's consider a simple case, a set of chips from an experiment say
treated and untreted with 10,000
genes on the chip and 1000 differentially expressed. Of the 10000, 7000
can be annotated and 6000 have
a GO function assinged to them at a suitible level. Say for this example
there are 30 Go clasess that appear.
I then conduct Fisher's exact test 30 times on each GO category to detect
differential representation of terms in the expressed
set and correct for multiple testing.

My question is on the validity of this procedure. Just from experience
many genes will
have multiple functions assigned to them so the genes falling into GO
classes are not independent.
Also, there is the large set of un-annotated genes so we are in effect
ignoring the influence of 
all the unannotated genes on the outcome. Do people have any thoughts or
opinions on these approaches? It is
appearing all over the place in bioinformatics tools like FATIGO, EASE,
DAVID etc. I find that 
the formal testing approach makes me very uncomfortable, especially as
the biologists I work with tend to over interpret the results.
I am very interested to see the discussion on this topic.


Bioconductor mailing list
Bioconductor at stat.math.ethz.ch

More information about the Bioconductor mailing list