[BioC] Odds Ratio in GOstat [resolved?]
usadel at mpimp-golm.mpg.de
Tue Dec 12 11:27:26 CET 2006
if I understand you right, your problem seems to be, that you
investigate the classifications of the best hits of the sequenced
organism and not the classes of your actual ESTs.
In this case, the route I usually take is to transfer the ontological
terms onto the ESTs (or better unigenes) and use these for testing. (I
use neither GO nor GOstats though).
From a biological point of view I think this also makes sense. Just
assume your sequenced species has one isoform of a particular enzyme
(B), which has expanded to two isoforms (B1 and B2) already, which are
not yet completely subfunctionalized etc. So in this case your
non-sequenced organism really has two times GO:molecular_function:whatever.
And also I am more interested in the distribution of genes the organism
I am looking at than an already sequenced one. As an extreme case if you
inferred GO terms by blasting plants against vertebrates, you will run
into the problem of the super expanded gene families in plants (which
are for real).
So to answer your question I would say 3 out of 5.
However, it is not trivial to transfer ontological terms especially if
the original were already "inferred from electronic annotation". Also if
you are not so sure about sequence clustering processes (e.g. ESTs B1
and B2 should really represent one unigene) things start getting shaky.
But there are annotation packages like Interpro2GO, blast2go and you
So to sum this up, I think you should rely on good old sequence based
Just my 5 cents though....
Naomi Altman wrote:
> The duplicate genes problem is an interesting one. The reason the
> selected gene list includes duplicates is because it comes from
> blasting an EST set from an unsequenced species against a sequenced
> species. The duplicates are supposed to be the nearest homolog of
> the EST but to represent multiple genes. How to handle this for GO
> enrichment is an interesting question.
> e.g. Annotation has genes A B C.
> We observe that matches A1 A2 and B1 are upregulated, but B2 and C
> are not. Should we say that 3 out of 5 are upregulated, or 2 out of 3?
> At 07:43 PM 12/11/2006, Seth Falcon wrote:
>> The selected gene list contained duplicate ids. I'm pretty sure this
>> is the problem. The Category + GOstats code should detect such input
>> errors and give a sensible error message. I will add such checking
>> very soon.
>> + seth
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Björn Usadel, PhD
Max Planck Institute of Molecular Plant Physiology
System Regulation Group
Am Mühlenberg 1
Tel (+49 331) 567-8114
Email usadel at mpimp-golm.mpg.de
More information about the Bioconductor