[BioC] mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db_2.3.6

Sean Davis seandavi at gmail.com
Wed Mar 3 02:28:43 CET 2010


On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at u.washington.edu> wrote:
> Hello,
>
> I've been running topGO (using mouse Entrez Gene IDs) and found that some GO terms that turn up in the topGO analysis are not in the GO terms from org.Mm.eg.db.
>
> I'd like to give some example code to show how to generate the problem, but my topGO code is a lot of lines.  The output looks like:
>
> allResults[[1]][[1]][1:2,]
>         GO.ID                                Term Annotated Significant Expected classic    elim weight
> 714 GO:0019222     regulation of metabolic process      2498         143   107.08 0.00010 0.17956 0.9057
> 762 GO:0006807 nitrogen compound metabolic process      3413         186   146.31 0.00011 0.45337 0.9434
>
> So, the topGO output gives a column of GOIDs and such.
>
> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, GO:0031497, GO:0046700.
>
> I can't find these in names(Mm.egGO2EG).
>
> library("org.Mm.eg.db")
> Mm.egGO2EG <- as.list(org.Mm.egGO2EG)
> grep("GO:0030522",names(Mm.egGO2EG))
> integer(0)
>
> Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db?  When I check for GO:0030522 for Mus musculus at geneontology.org, GO:0030522 is valid.
>
> I'm puzzled by the mismatch.  I want to get the genes for a given GOID, so there is probably a work around.  If anyone has a suggestion or idea, I'd be very grateful to know what to try.
>

Hi, Dick.

The Gene Ontology, as I'm sure everyone knows, is hierarchical.  The
org.Mm.egGO2EG table stores ONLY the most specific term for each gene.
 However, the org.Mm.egGO2ALLEGS stores the term and all the genes for
itself AND its children.  Most of the gene ontology analysis
algorithms use the latter definition; it looks like topGO does also.
In short, try this:

get('GO:0030522',org.Mm.egGO2ALLEGS)
     IDA      IMP      IDA      IGI      IMP      IGI      IMP      IMP
 "11835"  "11835"  "11848"  "12034"  "12034"  "13082"  "13123"  "13983"
     IMP      ISO      IMP      IDA      IMP      IMP      IMP      ISO
 "14228"  "14599"  "14602"  "14815"  "14815"  "15502"  "16000"  "16000"
     IDA      IDA      IMP      IDA      IGI      IMP      IMP      IDA
 "16601"  "18667"  "18854"  "19213"  "19378"  "19378"  "19411"  "20181"
     IDA      IDA      IMP      IMP      IMP      IPI      IDA      IGI
 "20182"  "20183"  "20779"  "21815"  "21848"  "22215"  "24074"  "27401"
     IMP      ISA      IDA      IDA      IMP      IDA
 "56351"  "56847"  "59035"  "67488" "224903" "232174"

Hope that helps clear things up.

Sean



More information about the Bioconductor mailing list