[BioC] goTools: ontoCompare question

Paquet, Agnes apaquet at medsfgh.ucsf.edu
Sat Jun 30 04:34:04 CEST 2007

Hi Dave,
The current algorithm in ontoCompare is the following:
- for each probe id in your list, retrieve all GO ids corresponding to this probe id
- then, map these Go ids up to the end nodes provided as argument to the function (or the default ones)
- Once the mapping is finished, add 1 to the count of each end node which was reached at least once (and not the number of times a node was hit, which explains the discrepancy in your example)
For example, if I use only 1 Affy probe, and restrict everything to MF to simplify your example, ontoCompare will give me the following results;
1) using the default end nodes:
> ontoCompare(list("1415670_at"),probeType="mouse4302",goType="MF",method="none")
[1] "Starting ontoCompare..."
[1] "Number of lists = 1"
[1] "Using method: none"

  binding structural molecule activity transporter activity NotFound
         1                            1                    1        0

(we have 1 count for each end node which was reached at least once)
2) Using your endlist
> ontoCompare(list("1415670_at"),probeType="mouse4302",goType="MF",method="none",endnode=endlist)
[1] "Starting ontoCompare..."
[1] "Number of lists = 1"
[1] "Using method: none"

  molecular_function NotFound
                  1      0
(same here, only 1 count for MF, and not 3)
We made this choice because some nodes/probes may be more annotated than others, and it could make the relative comparison of 2 lists of probes appear more different based on the availability of annotations, and not true biological difference. You could also use the other methods to get number of hits relative to the number of probes or the number of GO in your list.
I hope this will help, don't hesitate to email me again if you have more questions.

From: bioconductor-bounces at stat.math.ethz.ch on behalf of davidl at unr.nevada.edu
Sent: Fri 6/29/2007 8:01 AM
To: Bioconductor
Subject: [BioC] goTools: ontoCompare question


     I ran ontoCompare on the full list of probes in the mouse4302 genechip both
with the default EndNodeList() and with a custom end node list containing only
the antioxidant activity, biological_process, cellular_component, and
molecular_function GO terms and found what appears to be a discrepency:

> length(sviData$svi$ID)
[1] 45101
> sviData$svi$ID[1:5]
[1] "1452670_at"   "1422340_a_at" "1452114_s_at" "1422644_at"   "1423359_at"
> listall<-list("allprobes"=sviData$svi$ID)
> endlist<-c("GO:0003674", "GO:0005575", "GO:0008150", "GO:0016209")
> totalAnnotations<-ontoCompare(listall, probeType="mouse4302", method="none")
> write.table(totalAnnotations, file="totalAnnotations.txt")
> totalAnnotations2<-ontoCompare(listall, probeType="mouse4302", method="none",
> write.table(totalAnnotations2, file="totalAnnotations_reduced.txt")

When finding the total possible number of annotations for the top level GO terms
(BP, MF, CC), I got different numbers for the two approaches, but I got the
same numbers for "NotFound" and "antioxidant activity":

from totalAnnotations.txt
antioxidant activity         127
biological_process           2594
cellular_component           2365
molecular_function           2414
NotFound                     11120

from totalAnnotations_reduced.txt
antioxidant activity    127
biological_process      28020
cellular_component      28509
molecular_function      30875
NotFound                11120

I was just wondering if anyone knew why this might happen since it affects the
interpretation of a comparison I was going to do.  These data appear to reflect
the histogram output from ontoPlot (so I don't think its an R->txt->excel
thing).  Is the output with method="none" the total number of times all probes
are annotated at the endnode or at a child of the end node? Does it have
something to do with the "isa" values in EndNodeList() or my method of creating

R v.2.5.0
goTools v1.8.0


--and thank you Dick for recommending topGO.  I found what I needed through that

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list