[BioC] question about ontoCompare() performance change
Seth Falcon
sfalcon at fhcrc.org
Thu Nov 12 22:43:43 CET 2009
Hi again,
On 10/29/09 10:26 AM, Seth Falcon wrote:
> Thanks for the reminder and providing a reproducible example. We will
> take a look and see if we can understand and provide a fix for the slow
> down.
The goTools::ontoCompare function as currently coded takes "the long
way" at a couple of points when dealing with the GO annotation in the
GO.db package. Unfortunately, I don't see an easy way to make just a
few small changes to the existing function. I believe a significant
refactoring is required.
To that end, I've attempted to understand the main goal of the
ontoCompare function and to reproduce some of the functionality with a
different coding approach. My intention is to get things started, not
to furnish a complete fix. I have attached an R file containing
functions for an alternate implementation. Here's a summary:
## start out by executing a sample with current goTools code
library("goTools")
library("hgu133a.db")
data(probeID)
system.time(z0 <- ontoCompare(list(L1=affylist[[1]]), "hgu133a",
method="none"))
Starting ontoCompare...
user system elapsed
1280.047 21.033 1320.269
## Now demonstrate alternate
system.time(zz <- goCompare(affylist[[1]], "hgu133a"))
user system elapsed
14.712 0.116 15.154
Warning message:
In probeToGO(probes, probeType, ontology) :
removing 15 probe IDs with no mapping to GO
As you can see, the alternate is faster. *However*, I haven't taken the
time to completely re-implement the original function and worse, I get
slightly different results. You can use the following to compare:
zz[["Term"]] = sapply(zz$GO, function(x) Term(GOTERM[[x]]),
USE.NAMES=FALSE)
inboth <- intersect(rownames(z0), zz$Term)
zz[["OrigCount"]] <- as.integer(NA)
zz[match(inboth, zz$Term, nomatch=0L), "OrigCount"]
<- as.integer(z0[inboth, ])
zz[, c("Ontology", "Term", "OrigCount", "Count")]
Ontology Term OrigCount Count
1 MF molecular_function 3 76
19 CC cellular_component 2 76
34 BP biological_process 5 75
12 CC cell NA 74
13 CC cell part 74 74
2 MF binding 67 65
27 BP cellular process 58 58
21 CC organelle 45 45
36 BP metabolic process 44 44
11 MF catalytic activity 38 38
23 BP biological regulation 12 31
40 BP regulation of biological process 29 29
15 CC organelle part 24 24
44 BP localization 13 21
[snip]
I'm hoping that the attached code provides enough of a starting point
for the package maintainer or other motivated party to work up a
complete solution and understand the differences in the results.
+ seth
--
Seth Falcon
Program in Computational Biology | Fred Hutchinson Cancer Research Center
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: probeToGO.R
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20091112/e24e1547/attachment.pl>
More information about the Bioconductor
mailing list