[BioC] End of the line of GOstats: making sense of the hypergeometric test results now

James W. MacDonald jmacdon at med.umich.edu
Wed Nov 25 14:53:35 CET 2009


Hi Massimo,

Massimo Pinto wrote:
> Greetings all,
> 
> Having first searched the GMane archives, I suppose the following
> question is appropriate. After selecting my 'entrezUniverse', I have
> run an hypergeometric test, as implemented in functions provided in
> GOstats, and thus obtained a readable, hyperlinked report containing a
> list of the ontology nodes that appear to have been significantly
> implicated, along with p values, odds ratio, number of significantly
> regulated genes that fall in each listed node, etc.
> 
> The report is not exactly short, and I am looking for criteria to
> proceed with the interpretation of the results. Specifically, I am
> trying to hunt for the most 'interesting' implicated ontology nodes
> and, to this end, a marker may be useful. Assuming this line of
> thinking is appropriate and focusing on the first few lines of the
> report:
> 
>> GO.df.CM3.ctr1.2.3
> 
>         GOBPID       Pvalue OddsRatio    ExpCount Count Size
>                                                  Term
> 1   GO:0040011 9.322848e-05  2.558205  11.8928490    26  145
>                                            locomotion
> 2   GO:0002376 2.337660e-04  1.887324  28.2147590    47  344
>                                 immune system process
> 3   GO:0007165 2.821193e-04  1.541496  82.4297464   110 1005
>                                   signal transduction
> 4   GO:0006954 2.840421e-04  2.892962   7.3817683    18   90
>                                 inflammatory response
> 5   GO:0051272 4.985200e-04  6.638731   1.5583733     7   19
>                    positive regulation of cell motion
> 6   GO:0007154 5.866973e-04  1.493138  88.4992004   115 1079
>                                    cell communication
>  [...]
> 
> I do wonder whether the correct marker for my hunt is the p value, or
> the Odds Ratio, which would rank my list differently. Plus, the
> ontology nodes containing the largest number of genes (Size, above)
> may be of too broad scope to reveal the presence of a biological
> process that is specifically implicated in my experiment. By the same
> token, ontology nodes with too few genes may not provide convincing
> evidence of their implication.
> 
> Put shortly, what's the suggested strategy to proceed?

The strategy depends on your original hypothesis. If the hypothesis was 
that inflammation should be a factor in your experimental samples, then 
you should be looking at #4.

If there wasn't a hypothesis, then I would tend to look at the more 
directed terms first. Something like locomotion is so general as to be 
useless. However, positive regulation of cell motion would probably be a 
more tractable ontology to explore.

Best,

Jim


> 
> Thank you very much in advance to all of you who will read this post.
> 
> Yours
> Massimo
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list