[BioC] Gene Ontology: Shortest path from root to node

Marc Carlson mcarlson at fhcrc.org
Mon Jan 14 20:09:09 CET 2013


Hi Nicos,

You could use the GO.db package to get at this.  In there you will find 
an object called GOBPANCESTOR which acts like a classic R environment 
object and can be used with the get() method to pull out the ancestor 
terms of a given term all the way back to the root.

So for your example you could have done this:

library(GO.db)
get("GO:0008150", GOBPANCESTOR)

And you can see that the only ancestor to this term is in fact the root 
node: "all"


What about terms further down?  Well the same trick works for all the 
terms to get their ancestor terms:
get("GO:0006955", GOBPANCESTOR)



So you probably want to do something a bit like this:

length(get("GO:0006955", GOBPANCESTOR))

And (for example) compare that to:

length(get("GO:0008150", GOBPANCESTOR))

etc.


Of course it's all a little bit more complicated than that because the 
gene ontologies are actually DAGs (so terms can have more than one route 
back to the main node), and so your ancestors list may be longer than 
just the simple path back to the "all" node.  And in fact in the example 
I gave above this is true for the further down term "GO:0006955", which 
has two routes back to the main node, and hence it's "distance" (as 
hinted at by length) has been inflated by one in this case.


Anyhow, I hope this helps,


   Marc





On 01/14/2013 07:47 AM, WoA [guest] wrote:
> Given some GO BP terms for a gene I wish to find out, which of the terms has more specific meaning. I wish to find out the length of the shortest path between the BP Root term(GO:0008150) and the given term. Is there any suitable way to do that using any R package?
>
> Like something equivalent to:
> my $length = $node->lengthOfShortestPathToRoot;
>
> in Perl's "GO-TermFinder" package.
>
> Thanks in advance
>
>   -- output of sessionInfo():
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list