[BioC] End of the line of GOstats: making sense of the hypergeometric test results now

Wed Nov 25 11:45:53 CET 2009

Greetings all,

Having first searched the GMane archives, I suppose the following
question is appropriate. After selecting my 'entrezUniverse', I have
run an hypergeometric test, as implemented in functions provided in
GOstats, and thus obtained a readable, hyperlinked report containing a
list of the ontology nodes that appear to have been significantly
implicated, along with p values, odds ratio, number of significantly
regulated genes that fall in each listed node, etc.

The report is not exactly short, and I am looking for criteria to
proceed with the interpretation of the results. Specifically, I am
trying to hunt for the most 'interesting' implicated ontology nodes
and, to this end, a marker may be useful. Assuming this line of
thinking is appropriate and focusing on the first few lines of the
report:

> GO.df.CM3.ctr1.2.3

        GOBPID       Pvalue OddsRatio    ExpCount Count Size
                                                 Term
1   GO:0040011 9.322848e-05  2.558205  11.8928490    26  145
                                           locomotion
2   GO:0002376 2.337660e-04  1.887324  28.2147590    47  344
                                immune system process
3   GO:0007165 2.821193e-04  1.541496  82.4297464   110 1005
                                  signal transduction
4   GO:0006954 2.840421e-04  2.892962   7.3817683    18   90
                                inflammatory response
5   GO:0051272 4.985200e-04  6.638731   1.5583733     7   19
                   positive regulation of cell motion
6   GO:0007154 5.866973e-04  1.493138  88.4992004   115 1079
                                   cell communication
 [...]

I do wonder whether the correct marker for my hunt is the p value, or
the Odds Ratio, which would rank my list differently. Plus, the
ontology nodes containing the largest number of genes (Size, above)
may be of too broad scope to reveal the presence of a biological
process that is specifically implicated in my experiment. By the same
token, ontology nodes with too few genes may not provide convincing
evidence of their implication.

Put shortly, what's the suggested strategy to proceed?

Thank you very much in advance to all of you who will read this post.

Yours
Massimo