[BioC] mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db_2.3.6

Dick Beyer dbeyer at u.washington.edu
Wed Mar 3 07:32:20 CET 2010


Hi Sean,

Thanks very much for looking into this.  I guess I need to think about this.  What is confusing to me is topGO takes a gene2GO list as input (a list of GO terms for each gene), which I generated from org.Mm.egGO2EG (no GO:0030522, for example). Getting GOIDs out of topGO that are in org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG makes me think I should build my gene2GO input list from org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG.

I also didn't dig far enough when I checked GO:0030522 at geneontology.org, which showed 34 gene products for Mus musculus.  However, had I looked further I would have seen GO:0030522 has no genes of its own.

Until recently, I used ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz for getting Entrez Gene ID/GOIDs mappings, but switched to the Bioconductor org.Mm.eg.db way as it is much simplier.

Thanks for the good education!

Cheers,
Dick
*******************************************************************************
Richard P. Beyer, Ph.D.	University of Washington
Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
 			Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer
*******************************************************************************

On Tue, 2 Mar 2010, Sean Davis wrote:

> On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at u.washington.edu> wrote:
>> Hello,
>>
>> I've been running topGO (using mouse Entrez Gene IDs) and found that some GO terms that turn up in the topGO analysis are not in the GO terms from org.Mm.eg.db.
>>
>> I'd like to give some example code to show how to generate the problem, but my topGO code is a lot of lines.  The output looks like:
>>
>> allResults[[1]][[1]][1:2,]
>>         GO.ID                                Term Annotated Significant Expected classic    elim weight
>> 714 GO:0019222     regulation of metabolic process      2498         143   107.08 0.00010 0.17956 0.9057
>> 762 GO:0006807 nitrogen compound metabolic process      3413         186   146.31 0.00011 0.45337 0.9434
>>
>> So, the topGO output gives a column of GOIDs and such.
>>
>> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, GO:0031497, GO:0046700.
>>
>> I can't find these in names(Mm.egGO2EG).
>>
>> library("org.Mm.eg.db")
>> Mm.egGO2EG <- as.list(org.Mm.egGO2EG)
>> grep("GO:0030522",names(Mm.egGO2EG))
>> integer(0)
>>
>> Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db?  When I check for GO:0030522 for Mus musculus at geneontology.org, GO:0030522 is valid.
>>
>> I'm puzzled by the mismatch.  I want to get the genes for a given GOID, so there is probably a work around.  If anyone has a suggestion or idea, I'd be very grateful to know what to try.
>>
>
> Hi, Dick.
>
> The Gene Ontology, as I'm sure everyone knows, is hierarchical.  The
> org.Mm.egGO2EG table stores ONLY the most specific term for each gene.
> However, the org.Mm.egGO2ALLEGS stores the term and all the genes for
> itself AND its children.  Most of the gene ontology analysis
> algorithms use the latter definition; it looks like topGO does also.
> In short, try this:
>
> get('GO:0030522',org.Mm.egGO2ALLEGS)
>     IDA      IMP      IDA      IGI      IMP      IGI      IMP      IMP
> "11835"  "11835"  "11848"  "12034"  "12034"  "13082"  "13123"  "13983"
>     IMP      ISO      IMP      IDA      IMP      IMP      IMP      ISO
> "14228"  "14599"  "14602"  "14815"  "14815"  "15502"  "16000"  "16000"
>     IDA      IDA      IMP      IDA      IGI      IMP      IMP      IDA
> "16601"  "18667"  "18854"  "19213"  "19378"  "19378"  "19411"  "20181"
>     IDA      IDA      IMP      IMP      IMP      IPI      IDA      IGI
> "20182"  "20183"  "20779"  "21815"  "21848"  "22215"  "24074"  "27401"
>     IMP      ISA      IDA      IDA      IMP      IDA
> "56351"  "56847"  "59035"  "67488" "224903" "232174"
>
> Hope that helps clear things up.
>
> Sean
>



More information about the Bioconductor mailing list