[BioC] Odds Ratio in GOstat [resolved?]

Naomi Altman naomi at stat.psu.edu
Tue Dec 12 05:12:47 CET 2006

The duplicate genes problem is an interesting one.  The reason the 
selected gene list includes duplicates is because it comes from 
blasting an EST set from an unsequenced species against a sequenced 
species.  The duplicates are supposed to be the nearest homolog of 
the EST but to represent multiple genes.  How to handle this for GO 
enrichment is an interesting question.

e.g.  Annotation has genes A B C.
We observe that matches A1 A2 and B1 are upregulated, but  B2 and C 
are not.  Should we say that 3 out of 5 are upregulated, or 2 out of 3?


At 07:43 PM 12/11/2006, Seth Falcon wrote:
>The selected gene list contained duplicate ids.  I'm pretty sure this
>is the problem.  The Category + GOstats code should detect such input
>errors and give a sensible error message.  I will add such checking
>very soon.
>+ seth
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>Search the archives: 

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111

More information about the Bioconductor mailing list