[BioC] How are GO2PROBE built

Robert Gentleman rgentlem at fhcrc.org
Fri Oct 3 21:23:53 CEST 2008


And you can do any/all of the filtering that is described on the Broad 
page if you want to (all evidence codes and the entire tree structure is 
available to you).

You could also use GSEAbase and the tools in it to simply retrieve the 
Broad sets, if you think you want to use them without having to recreate.

you may see differences as they downloaded their GO data in January of 
last year - we downloaded ours more recently (each package gives you 
explicit dates and sources)



Marc Carlson wrote:
> Sean is right,
> 
> We don't try to guess what parts of GO you may or may want in your
> analysis.  As much as possible, we try to simply present all of the
> data.  Is is sometimes helpful to let people know that our GO annotation
> data can be found split into two kinds of packages:
> 
> The GO.db package just provides a snapshot of the GO ontology with no
> information to affiliate GO IDs to any genes.  We don't prune anything
> off of this ontology.
> 
> Then the other kinds of annotation packages (chip and entrez
> gene/organism based) contain the mappings between entrez gene IDs and GO
> terms which we obtain from NCBI.  These too are intended to be complete
> as possible, and the only things left out are things that specifically
> don't belong in the scope of that package.  So for example, you should
> not find mappings for mouse genes in  the human centric package
> org.Hs.eg.db.
> 
> But I am not sure how comparable this really is to MSigDB...
> 
>   Marc
> 
> 
> 
> Sean Davis wrote:
>> On Thu, Oct 2, 2008 at 8:13 PM, Oura Tomonori <tomonori.oura at gmail.com> wrote:
>>   
>>> Hi Sean
>>>
>>> Thank you for general information about current annotation mapping systems.
>>>
>>> But, I want to know the specific information about building process of
>>> metadata like,
>>> how to omit the gene ontology terms with redundant or poor information,
>>> like the description of MSigDB C5 collection bellow,
>>>
>>> >From http://www.broad.mit.edu/gsea/msigdb/collection_details.jsp#C5
>>>
>>> GO gene sets for very broad categories, such as Biological Process,
>>> have been omitted from MSigDB. GO gene sets with fewer than 10 genes
>>> have also been omitted. Gene sets with the same members have been
>>> resolved based on the GO tree structure: if a parent term has only one
>>> child term and their gene sets have the same members, the child gene
>>> set is omitted; if the gene sets of sibling terms have the same
>>> members, the sibling gene sets are omitted.
>>>     
>> Marc Carlson can be authoritative on this, but there is no cleanup or
>> omission of the data.  The data are taken directly from NCBI Entrez
>> Gene and should agree with that source as of the date that the
>> packages were built.
>>
>> Sean
>>
>>   
>>> 2008/10/2 Sean Davis <sdavis2 at mail.nih.gov>:
>>>     
>>>> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at gmail.com> wrote:
>>>>       
>>>>> Dear BioC,
>>>>>
>>>>> How are the mappings of Affymetrix probe ids to Gene Ontology terms in
>>>>> metadata package provided by Bioconductor build?
>>>>>
>>>>> I am trying to use some gene set analysis packages and find some
>>>>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>>>>> another package use the external gene set definition, such as MSigDB.
>>>>>
>>>>> So I want to know the criteria for select specific GO term among
>>>>> possible terms for each probe id in Bioconductor.
>>>>> I already read the documents about AnnBuilder package, however.
>>>>>         
>>>> To make a long story short, the annotations available from affy are
>>>> mapped to Entrez Gene IDs.  Then, the information from Entrez Gene--in
>>>> this case, gene ontology--is mapped to affy id.  The dates associated
>>>> with the data, the source of the data, and how the data are mapped
>>>> will all affect the final mapping of affy ID to gene ontology.  The
>>>> nice thing about gene ontology analyses is that they are typically
>>>> based on "sets" of genes making it much less important to start with
>>>> EXACTLY the same gene ontology mappings.  In fact, in practice, it
>>>> will be pretty difficult to do so.
>>>>
>>>> If you want to see the details of the current Bioconductor annotation
>>>> package build process, you want to read the AnnotationDbi SQLForge
>>>> vignette, as AnnBuilder is outdated.
>>>>
>>>> Finally, if I have misunderstood your question, perhaps you could clarify.
>>>>
>>>> Sean
>>>>
>>>>       
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>     
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list