[BioC] GOstats v2.2.3 . bug in inducedTermGraph

Seth Falcon sfalcon at fhcrc.org
Thu May 31 19:36:59 CEST 2007


Hi,

Thanks for the nice reproducible bug report.

Pierre-Yves Boëlle <boelle at u707.jussieu.fr> writes:

> Using R : 2.5.0 / BioC 2.0 / GOStats 2.2.3 / GO : 1.16 on Windows XP.
>
> inducedTermGraph(hgOver,names(pvalues(hgOver)[95]),children=FALSE)
>
> ----------------------------
> fails with :
> Erreur dans checkValidNodeName(node) : invalid node names: missing value 
> NA not allowed
>
> looking at the code (in GOHyperGResults-accessors.R) , it appears that 
> parents are looked for in the "KidsEnv" instead of "ParentsEnv"

Yes, that was part of the bug.  Also, the code needed to handle NA's
explicitly since this could occur for children=TRUE.  Below is a new
version of inducedTermGraph which you can test and which I plan to
include in a patch release for GOstats.

> Also, the graphs is reversed? (ancestor at the bottom, children on
> top)

The edges of the returned graph connect children to their parents.
This matches the behavior of the GOGraph function as well as matches a
convention in UML of edges going from subclasses to parent classes.

As for top/bottom, that is a display issue and I believe there are
ways of asking graphviz to flip things if that is what you want.
Since these graphs are tree-like, one could argue that rendering a
tree with root at the bottom and leaves at the top has a certain
appeal in that it matches the way most trees choose to orient
themselves ;-)

Best,

+ seth

Here is a new version of the function to test out:

inducedTermGraph <- function(r, id, children=TRUE, parents=TRUE) {
    if (!children && !parents)
      stop("children and parents can't both be FALSE")
    ## XXX: should use more structure here
    goName <- paste(testName(r), collapse="")
    goKidsEnv <- get(paste(goName, "CHILDREN", sep=""))
    goParentsEnv <- get(paste(goName, "PARENTS", sep=""))
    goIds <- character(0)

    wantedNodes <- id
    ## children
    if (children) {
        wantedNodes <- c(wantedNodes,
                         unlist(edges(goDag(r))[id], use.names=FALSE))
    }
    ## parents
    g <- reverseEdgeDirections(goDag(r))
    if (parents) {
        wantedNodes <- c(wantedNodes,
                         unlist(edges(g)[id], use.names=FALSE))
    }
    wantedNodes <- unique(wantedNodes)
    g <- subGraph(wantedNodes, g)

    ## expand; add children and/or parents that are not present in g,
    ## but are definedin the GO data.
    if (children) {
        for (goid in id) {
            kids <- unique(goKidsEnv[[goid]])
            for (k in kids) {
                if (is.na(k)) next
                if (!(k %in% nodes(g))) {
                    g <- addNode(k, g)
                    g <- addEdge(k, goid, g)
                }
            }
        }
    }
    if (parents) {
        for (goid in id) {
            elders <- unique(goParentsEnv[[goid]])
            for (p in elders) {
                if (is.na(p)) next
                if (!(p %in% nodes(g))) {
                    g <- addNode(p, g)
                    g <- addEdge(goid, p, g)
                }
            }
        }
    }
    g
}


-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the Bioconductor mailing list