[BioC] End of the line of GOstats: making sense of the hypergeometric test results now

Robert Gentleman rgentlem at fhcrc.org
Wed Nov 25 15:57:54 CET 2009


Hi,
   two comments:
1) how you interpret the output depends a bit on whether you used 
conditional=TRUE or FALSE (I don't think you have told us). And which 
you use depends on what you are trying to achieve.

2) the odds ratio is the size of the effect (if you are more comfortable 
with gene expression data then think "fold change") and the p-value (as 
always) tells you how unusual that is under the null hypothesis.  You 
should rank your list by which is most important to you.

   Robert

James W. MacDonald wrote:
> Hi Massimo,
> 
> Massimo Pinto wrote:
>> Greetings all,
>>
>> Having first searched the GMane archives, I suppose the following
>> question is appropriate. After selecting my 'entrezUniverse', I have
>> run an hypergeometric test, as implemented in functions provided in
>> GOstats, and thus obtained a readable, hyperlinked report containing a
>> list of the ontology nodes that appear to have been significantly
>> implicated, along with p values, odds ratio, number of significantly
>> regulated genes that fall in each listed node, etc.
>>
>> The report is not exactly short, and I am looking for criteria to
>> proceed with the interpretation of the results. Specifically, I am
>> trying to hunt for the most 'interesting' implicated ontology nodes
>> and, to this end, a marker may be useful. Assuming this line of
>> thinking is appropriate and focusing on the first few lines of the
>> report:
>>
>>> GO.df.CM3.ctr1.2.3
>>
>>         GOBPID       Pvalue OddsRatio    ExpCount Count Size
>>                                                  Term
>> 1   GO:0040011 9.322848e-05  2.558205  11.8928490    26  145
>>                                            locomotion
>> 2   GO:0002376 2.337660e-04  1.887324  28.2147590    47  344
>>                                 immune system process
>> 3   GO:0007165 2.821193e-04  1.541496  82.4297464   110 1005
>>                                   signal transduction
>> 4   GO:0006954 2.840421e-04  2.892962   7.3817683    18   90
>>                                 inflammatory response
>> 5   GO:0051272 4.985200e-04  6.638731   1.5583733     7   19
>>                    positive regulation of cell motion
>> 6   GO:0007154 5.866973e-04  1.493138  88.4992004   115 1079
>>                                    cell communication
>>  [...]
>>
>> I do wonder whether the correct marker for my hunt is the p value, or
>> the Odds Ratio, which would rank my list differently. Plus, the
>> ontology nodes containing the largest number of genes (Size, above)
>> may be of too broad scope to reveal the presence of a biological
>> process that is specifically implicated in my experiment. By the same
>> token, ontology nodes with too few genes may not provide convincing
>> evidence of their implication.
>>
>> Put shortly, what's the suggested strategy to proceed?
> 
> The strategy depends on your original hypothesis. If the hypothesis was 
> that inflammation should be a factor in your experimental samples, then 
> you should be looking at #4.
> 
> If there wasn't a hypothesis, then I would tend to look at the more 
> directed terms first. Something like locomotion is so general as to be 
> useless. However, positive regulation of cell motion would probably be a 
> more tractable ontology to explore.
> 
> Best,
> 
> Jim
> 
> 
>>
>> Thank you very much in advance to all of you who will read this post.
>>
>> Yours
>> Massimo
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list