[BioC] How to retrieve all GO terms at level 5 as well as their annotated genes

Hervé Pagès hpages at fhcrc.org
Wed Jun 26 08:50:46 CEST 2013


Hi Jenny,

On 06/25/2013 06:49 AM, Zadeh, Jenny Drnevich wrote:
> Hi Peter,
>
> I recently went through this with a client and he had a hard time understanding that there is not really a unique "level" of GO terms. Many of your level 5 terms can also be level 4 terms, or level 3 terms or level 6 term, etc. This is due the acyclic nature of the GO terms and the multiple paths possible from one ancestor to one descendent. Just want to point this out!

Good point. And to illustrate this:

   > length(intersect(level4_BP_terms, level5_BP_terms))
   [1] 7738

which means 7738 terms belong to level 4 and 5.

What is unique however is the "minimum level" of a term i.e. the length 
of the shortest path between the term and the root of the ontology.
In other words, the "minimum level" of a term is its distance to the
root.

If you want the BP terms that are at distance 5 from the root, just do:

   dist5_BP_terms <- setdiff(level5_BP_terms,
       c(level4_BP_terms, level3_BP_terms, level2_BP_terms, 
level1_BP_terms))

   > length(minlevel5_BP_terms)
   [1] 7072

Playing a little bit more with this it seems that all the terms in the
BP ontology are at a distance <= 12 from the root term. There are only
3 terms at distance 12: GO:0051564, GO:0051565, and GO:0007035.

Cheers,
H.

>
> Cheers,
> Jenny
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Hervé Pagès
> Sent: Monday, June 24, 2013 8:20 PM
> To: Peter Davidsen
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] How to retrieve all GO terms at level 5 as well as their annotated genes
>
> Hi Peter,
>
> Probably not the most elegant way, but you could do something like this (granted that I understand correctly what a "level 5" term is):
>
>     library(GO.db)
>
>     getAllBPChildren <- function(goids)
>     {
>       ans <- unique(unlist(mget(goids, GOBPCHILDREN), use.names=FALSE))
>       ans <- ans[!is.na(ans)]
>     }
>
>     level1_BP_terms <- getAllBPChildren("GO:0008150")     # 23 terms
>     level2_BP_terms <- getAllBPChildren(level1_BP_terms)  # 256 terms
>     level3_BP_terms <- getAllBPChildren(level2_BP_terms)  # 3059 terms
>     level4_BP_terms <- getAllBPChildren(level3_BP_terms)  # 9135 terms
>     level5_BP_terms <- getAllBPChildren(level4_BP_terms)  # 15023 terms
>
>     library(org.Hs.eg.db)
>     level5_genes <- mget(intersect(level5_BP_terms, keys(org.Hs.egGO2EG)),
>                          org.Hs.egGO2EG)
>
> Cheers,
> H.
>
> On 06/21/2013 02:28 AM, Peter Davidsen wrote:
>> Dear list,
>>
>> I'm looking for a way to get the names of all Gene Ontology terms for
>> Biological Processes at level 5 as well as the genes (human gene
>> symbols) annotated to each of the level 5 GO terms.
>>
>> I have tried to query the DAVID knowledgebase, but the online tool
>> doesn't seem to respond to any requests. Hence, could anybody maybe
>> point me in the direction of a package that could provide me with the
>> same information?
>>
>> Kind regards,
>> Peter
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list