[BioC] Gene Ontology relationships

davidl at unr.nevada.edu davidl at unr.nevada.edu
Thu Jun 21 23:02:27 CEST 2007


     I tried out these new functions (termGraphs and plotGOTermGraph) and they
seem to do what they were intended to do just fine (except I couldn't seem to
get the labels to change from the GO numbers to the Descriptive labels).  The
functions output each set of connected GO terms as an individual graph,
however, which makes it difficult to get an overall idea of what over
represented GO terms are related in the total set of over represented GO terms.
 I borrowed a good portion of the code and functions used in termGraphs and
plotGOTermGraphs and modified them a little so that I could output all the
connected components in one graph:

connectedTerms <- function(r, max.nchar=NULL, title=NULL) {

       	pvalue <- pvalueCutoff(r)
	if (is.null(id)) {
        goids1 <- sigCategories(r, pvalue)
	else    {
	goids1 <- id
   	subG <- subGraph(goids1, goDag(r))
	cc <- connectedComp(subG)
	for(i in 1:length(cc)){
	connGOs<-c(connGOs, cc[[i]])
	finalG<-subGraph(connGOs, goDag(r))
        nodeDataDefaults(finalG) <- list(term = as.character(NA))
        nodeData(finalG, attr = "term") <-
            GOTERM), Term))
	termLab <- unlist(nodeData(finalG, attr = "term"))
	n <- nodes(finalG)
	resultTerms <- names(pvalues(r))
	counts <- sapply(n, function(x) {
        	if (x %in% resultTerms) {
                	paste(geneCounts(r)[x], "/", universeCounts(r)[x], sep = "")
            	else {
	if (!is.null(max.nchar)) {
	termLab <- sapply(termLab, substr, 1L, max.nchar, USE.NAMES = FALSE)
	nlab <- paste(termLab, counts)
	nattr <- makeNodeAttrs(finalG, label = nlab, fixedsize=FALSE, fontsize =
"15000", shape="rectangle")
	attr<-list(node=list(), edge=list(), graph=list(rankdir= "LR"))
	plot(finalG, nodeAttrs=nattr, attrs=attr)

I added that for loop to combine all the connected GO terms and came up with a
list of node attributes which almost always gave me a readable graph.  I had to
increase the font size to 15000 in order to accomplish this when I had more
than 200 nodes or more than 3 hierarchical ranks, since I couldn't seem to get
a lot of the other Graphviz parameters to work (like size/ratio/overlap/etc.). 
I'm sure there is a better way to do this:
1. because I heard you were supposed to avoid for loops and
2. because 15,000 seems a bit excessive for font size),
but I'm pretty new to any sort of command prompt computer activity.

After getting these graphs, however, I noticed another problem.  If the GO terms
were separated by an intervening non-significant GO term, they weren't
connected by connectedComp().  This means you can't really use these functions
to find whether two significant GO terms are in the same GO branch, which to me
seems like the main point of this sort of function.

If there was a way to trace each significant GO term up to it's top parent term
(i.e. biological process) through its more immediate parents and to color code
the significant terms, that would be the ideal way to visualize how your over
represented GO terms are functionally related.  As Seth stated before, though,
this probably can't be done with the Rgraphviz, since these graphs aren't
scalable/zoomable and the node labels become prohibitively small with more than
150 or so nodes.

Does any one have any ideas as to how to obtain such a graph (significant terms
traced through their parents and color coded) either in R or in a program in
which Bioconductor output could easily be used?  I apologize for the lengthy
post and welcome any ideas,


Quoting Seth Falcon <sfalcon at fhcrc.org>:

> Hi,
> > davidl at unr.nevada.edu writes:
> >> Hello Seth,
> >>
> >>      Thank you very much for the response.  I read the help pages for
> those
> >> functions and they sound like they are exactly what I was looking for.  I
> ran
> >> into a problem actually using termGraphs, however.  This may be something
> >> simple and stupid but I am having trouble identifying what the problem is.
> >> This is the relevant part of my workflow and the resulting error message:
> >>
> >>> paramsFvBup<-new("GOHyperGParams", geneIds=llsupFvB,
> >> universeGeneIds=llsUniversFvB, annotation="mouse4302", ontology="BP",
> >> pvalueCutoff=0.05, conditional=TRUE, testDirection="over")
> >>> FvBupOverBP<-hyperGTest(paramsFvBup)
> >>> htmlReport(FvBupOverBP, file="test.html")
> >> #That all worked fine and the table looks good
> >>> termGraphs(FvBupOverBP,pvalue=0.05, use.terms=TRUE)
> >> Error in "names<-"(`*tmp*`, value = c(NA_character_, NA_character_,
> >> NA_character_,  :
> >>         'value' argument must specify unique names
> I believe that I've fixed the problem.  GOstats_2.2.5 is available via
> biocLite (not yet for OS X).  In the meantime, I noticed a labeling
> problem with plotGOTermGraph and GOstats_2.2.6 will be available in
> the next couple of days.
> Please let me know if you encounter further problems --- these
> functions for extracting subgraphs of the results and plotting are
> quite new and somewhat experimental so I'm open to suggestions.
> Best,
> + seth
> --
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
> http://bioconductor.org

More information about the Bioconductor mailing list