[BioC] Finding GO leaf nodes for an ontology - which package?

Seth Falcon sfalcon at fhcrc.org
Fri Jul 27 18:38:12 CEST 2007


Hi Tim,

Tim Smith <tim_smith_666 at yahoo.com> writes:
> Hi,
>
> I was trying to list all the leaf nodes for a particular
> ontology. For this, I was using the GOstats:

> g1 <- oneGOGraph("GO:0003674", GOMFCHILDREN)
> g2 <- GOleaves(g1)

That isn't actually what you want.  oneGOGraph (which just calls
GOGraph) returns a graph with edges directed _from_ the keys in the
dataenv map (GOMFCHILDREN in your example) _to_ the values in the
dataenv map.

So in your example above, you will have edges from parent node to
child node.  This is the reverse of how much of the GOstats code
usually thinks about GO DAGs -- the convention is to have edges go
from child to parent to indicate the is-a relationship.

So GOLeaves is making this assumption and along with taking a long
time.  Now you could use reverseEdgeDirections to change the direction
of the edges of your graph, but this in itself will be somewhat slow
and GOLeaves will _still_ perform badly.

Instead, consider that with the graph you created, you are interested
in nodes that have no edges.  So the following will give you all
leaves (and fairly quickly too):

> g1
A graphNEL graph with directed edges
Number of Nodes = 7527 
Number of Edges = 8781 

## count the number of (outgoing) edges for each node
> system.time(nKids <- listLen(edges(g1)))
   user  system elapsed 
  0.036   0.001   0.063 

## get the names of the nodes that have no (outgoing) edges.  These
## are the leaves
> system.time(leaves <- names(edges(g1)[nKids == 0]))
   user  system elapsed 
  0.035   0.000   0.037 
> length(leaves)
[1] 6006

## verify
> all(is.na(mget(leaves, GOMFCHILDREN)))
[1] TRUE

> Hopefully, this would give me a list of all the leaf nodes for the
> molecular function ontology. But this is taking too long to execute.
>
> Is there a similar function in some other package that would be
> quicker?

I will see about improving GOLeaves, but the above should get you
going for now...

Best,

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/



More information about the Bioconductor mailing list