[BioC] Re: GoHyperG

Sean Davis sdavis2 at mail.nih.gov
Thu Dec 23 17:29:45 CET 2004

On Dec 23, 2004, at 10:52 AM, Nicholas Lewin-Koh wrote:

> Hi Sean,
> In this situation I would hope it is a one sided test. I had this
> same discussion with a colleague who wanted the same thing. I don't
> think testing for under-representation means anything. Think about
> the context, one is doing recursive sampling of a finite of a finite
> population for which there are two sources of bias, what is represented
> in the database or on the chip, and what is annotated on the chip.
> Further you are testing at each node the discrepency from random,
> as you go down the DAG zero becomes more and more probable, you can
> think
> of it as doing a mark-recapture study on your genes. This problem is
> exacerbated
> by the sampling bias. Finally, a last complication is that test is
> further biased by your ability to detect differentially expressed 
> genes.
> At least if you detect over-representation you can argue for a strong
> signal.

I'm being a bit dense, but suppose I have 10000 genes on a chip 
(annotated in ontology Y), 1000 of which are annotated as category X; I 
find 1000 differentially-expressed genes (annotated in ontology Y) from 
that chip, but only 12 are from category X.  Is that not interesting to 
know about?

As for finding zeros, as it becomes more probable as one moves down the 
DAG, of course finding "underrepresented" groups becomes prohibitively 
difficult, but for large categories is certainly possible.  As for 
biases, I'm not sure that I agree that ability to detect 
differentially-expressed genes is a source of "bias".  It is certainly 
a limitation, but I don't think a bias.  And I'm not sure what 
"sampling bias" might be present?

Thanks for the food for thought.


>> Message: 4
>> Date: Wed, 22 Dec 2004 11:02:55 -0500
>> From: Sean Davis <sdavis2 at mail.nih.gov>
>> Subject: [BioC] GoHyperG
>> To: Bioconductor <bioconductor at stat.math.ethz.ch>
>> Message-ID: <F0EE8E4B-5432-11D9-ACCB-000D933565E8 at mail.nih.gov>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed
>> Just a quick question--are the p-values from gohyperg one- or
>> two-sided?  I have a collaborator who would like to use it to 
>> determine
>> underrepresented ontology categories.
>> Thanks,
>> Sean

More information about the Bioconductor mailing list