[BioC] How are GO2PROBE built

Fri Oct 3 02:13:13 CEST 2008

Hi Sean

Thank you for general information about current annotation mapping systems.

But, I want to know the specific information about building process of
metadata like,
how to omit the gene ontology terms with redundant or poor information,
like the description of MSigDB C5 collection bellow,

>From http://www.broad.mit.edu/gsea/msigdb/collection_details.jsp#C5

GO gene sets for very broad categories, such as Biological Process,
have been omitted from MSigDB. GO gene sets with fewer than 10 genes
have also been omitted. Gene sets with the same members have been
resolved based on the GO tree structure: if a parent term has only one
child term and their gene sets have the same members, the child gene
set is omitted; if the gene sets of sibling terms have the same
members, the sibling gene sets are omitted.

Tomonori

2008/10/2 Sean Davis <sdavis2 at mail.nih.gov>:
> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at gmail.com> wrote:
>> Dear BioC,
>>
>> How are the mappings of Affymetrix probe ids to Gene Ontology terms in
>> metadata package provided by Bioconductor build?
>>
>> I am trying to use some gene set analysis packages and find some
>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>> another package use the external gene set definition, such as MSigDB.
>>
>> So I want to know the criteria for select specific GO term among
>> possible terms for each probe id in Bioconductor.
>> I already read the documents about AnnBuilder package, however.
>
> To make a long story short, the annotations available from affy are
> mapped to Entrez Gene IDs.  Then, the information from Entrez Gene--in
> this case, gene ontology--is mapped to affy id.  The dates associated
> with the data, the source of the data, and how the data are mapped
> will all affect the final mapping of affy ID to gene ontology.  The
> nice thing about gene ontology analyses is that they are typically
> based on "sets" of genes making it much less important to start with
> EXACTLY the same gene ontology mappings.  In fact, in practice, it
> will be pretty difficult to do so.
>
> If you want to see the details of the current Bioconductor annotation
> package build process, you want to read the AnnotationDbi SQLForge
> vignette, as AnnBuilder is outdated.
>
> Finally, if I have misunderstood your question, perhaps you could clarify.
>
> Sean
>