[BioC] Finding GO leaf nodes for an ontology - which package?

Seth Falcon sfalcon at fhcrc.org
Sun Jul 29 01:03:53 CEST 2007


Hi,

[I thought I sent a reply, but I didn't see it come through.  So sorry
if this ends up being a dup]

Tim Smith <tim_smith_666 at yahoo.com> writes:
> I was trying to list all the leaf nodes for a particular
> ontology. For this, I was using the GOstats:
>
> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)
>
> Hopefully, this would give me a list of all the leaf nodes for the
> molecular function ontology. But this is taking too long to execute.

oneGOGraph is a wrapper for GOGraph.  GOGraph has the following
arguments:  x, dataenv.  The function builds a directed graph where
edges go from nodes that are keys in dataenv to nodes that are values
in dataenv.

So in your example, g1 will have edges going from parent to child GO
terms.  It turns out that this is exactly the opposite of the
convention used in GOstats; edges in graphs representing GO point from
child to parent.  One reason is that this is the way is-a
relationships are signified in UML.  Upshot: GOLeaves, in addition to
taking forever, is not computing what you want.

You could use graph::reverseEdgeDirections on g1 and then call
GOLeaves.  I think this will give you the right answer, but it will
still take forever (looks like GOLeaves needs to be sent to the
optimizer).

If you really are only interested in the leaves of the MF ontology,
then you just need to find the GO terms in GOMFCHILDREN that have no
children.

    system.time(
       isLeaf <- unlist(eapply(GOMFCHILDREN,
                        function(x) length(x) == 1 && is.na(x)))
    )
       user  system elapsed 
      0.174   0.070   1.185 
    
    leaves <- names(isLeaf[isLeaf])

If you are interested in the leaves of a graph with edges going from
parent to child, like g1, then you can do:

    numKids <- listLen(edges(g1))
    leaves <- names(edges(g1)[numKids == 0])

This is fast for a graph the size of g1.

Best,

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/



More information about the Bioconductor mailing list