[BioC] disagreement between biomaRt and topGO

Dick Beyer dbeyer at u.washington.edu
Wed Sep 5 20:25:31 CEST 2007


Hi Steffen,

Thanks for the good tip.

I was getting my huge table as I needed EGs with GOs for all GO terms, but I haven't used getBM() before.

Cheers,
Dick

*******************************************************************************
Richard P. Beyer, Ph.D.	University of Washington
Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
 			Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer
*******************************************************************************

On Wed, 5 Sep 2007, Steffen wrote:

> Hi Dick,
>
> I can't help you with the topGO part of your question but a faster way to get 
> the EntrezGene identifiers associated with a certain GO id is as follows:
>
> ids = getBM(c("entrezgene","go"), filters="go", values="GO:0006836", mart = 
> mart)
>
> It looks like there are indeed 32 EntrezGene identifiers associated with this 
> term in the current release of Ensembl.
>
> Cheers,
> Steffen
>
> Dick Beyer wrote:
>> Hello,
>> 
>> I have a puzzling problem with a disagreement between the numbers of 
>> EntrezGene IDs I get from biomaRt and what I get from topGO for the GO 
>> category GO:0006836, neurotransmitter transport. I would assume the mismatch 
>> would be due to different GO builds, but what I find seems too extreme for 
>> that.
>> 
>> Let me give a little background to how I get my list of Entrez Gene IDs.
>> 
>> I start with the file 
>> ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.dat.gz
>> 
>> from which I extract IPI, EG, Symbol, description, UniGene
>> 
>> This eventually gives me a list of 22963 unique Entrez Gene IDs.  I process 
>> this list through biomaRt as follows:
>> 
>> library(biomaRt)
>> mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl")
>> get.go.biomart <- 
>> getGO(id=ipi.LL.sym.descrip.ug.unique_eg[,2],type="entrezgene",mart=mart)
>> 
>> >From this data.frame I can determine that there are 32 unique Entrez Gene 
>> IDs in the Gene Ontology Biological Process category GO:0006836, 
>> neurotransmitter transport. Everything is fine so far.
>> 
>> Now I set up things for a topGO (non-affy) calculation:
>> 
>> gene2GO <- as.list(get.go.biomart$go)
>> names(gene2GO) <- get.go.biomart$entrezgene
>> 
>> geneList <- list()
>> geneList[1] <- list(factor(as.integer(ipi.LL.sym.descrip.ug.unique_eg[,2] 
>> %in% LL1027[,2])))
>> names(geneList[[1]]) <- ipi.LL.sym.descrip.ug.unique_eg[,2]
>> 
>> This geneList[[1]] has, among its 22963 entries, 32 entries (that is, 32 of 
>> the "names" are Entrez Gene IDs) that are in category GO:0006836, and 4 of 
>> entries I have marked as significant by setting their values to 1.
>> 
>> Next, I create a topGOdata Biological Process object:
>> 
>> BP.GOdata <- list() BP.GOdata[1i] <- list(new("topGOdata", ontology = "BP", 
>> allGenes = geneList[[1]],
>> annot = annFUN.gene2GO, gene2GO = gene2GO))
>> 
>> and submit this topGOdata object to the following:
>> 
>>      test.stat.BP    <- list()
>>      result.BP       <- list()
>>      test.stat.BP[1] <- list(new("classicCount", testStatistic = 
>> GOFisherTest, name = "Fisher test"))
>>      result.BP[1]    <- list(getSigGroups(BP.GOdata, test.stat.BP[[1]]))
>>      test.stat.BP[2] <- list(new("elimCount", testStatistic = GOFisherTest, 
>> name = "Fisher test", cutOff = 0.01))
>>      result.BP[2]    <- list(getSigGroups(BP.GOdata, test.stat.BP[[2]]))
>>      test.stat.BP[3] <- list(new("weightCount", testStatistic = 
>> GOFisherTest, name = "Fisher test", sigRatio = "ratio"))
>>      result.BP[3]    <- list(getSigGroups(BP.GOdata, test.stat.BP[[3]]))
>>      l <- list(classic = score(result.BP[[1]]), elim = 
>> score(result.BP[[2]]), weight = score(result.BP[[3]]))
>>      allRes.BP <- genTable(BP.GOdata, l, orderBy = "classic", ranksOf = 
>> "classic", top = length(score(result.BP[[1]])))
>> 
>> My resulting data.frame, allRes.BP, shows that for category GO:0006836, 
>> there are only 11 annotated Entrez Gene IDs, not 32. The call to genTable() 
>> does correctly find that 4 of these are significant.
>> 
>> This latest released version of topGO must be using the Mar 14 11:46:06 2007 
>> build of GO, and I assume biomaRt is more current as I did the query just 
>> today.  If category GO:0006836, neurotransmitter transport, has grown from 
>> 11 EGs to 32 EGs between now and last March for Human, then everything is as 
>> it should be.  But this seems unlikely to me.  So I wonder if there is a bug 
>> in the topGO calculations.
>> 
>> Here is my session:
>> 
>> 
>> sessionInfo()
>> R version 2.5.1 (2007-06-27) i386-pc-mingw32
>> 
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
>> States.1252;LC_MONETARY=English_United 
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>> 
>> attached base packages:
>> [1] "tools"     "stats"     "graphics"  "grDevices" "utils"     "datasets" 
>> "methods"   "base"
>> 
>> other attached packages:
>>      goTools hgu133plus2   Rgraphviz geneplotter     lattice       topGO 
>> SparseM    annotate          GO       graph     Biobase     biomaRt 
>> RCurl         XML
>>      "1.8.0"    "1.16.0"    "1.14.1"    "1.14.0"   "0.15-11"     "1.2.1" 
>> "0.73"    "1.14.1"    "1.16.0"    "1.14.2"    "1.14.1"    "1.10.1" 
>> "0.8-0"     "1.9-0"
>> 
>> Thanks very much,
>> Dick
>> *******************************************************************************
>> Richard P. Beyer, Ph.D.	University of Washington
>> Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
>> Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
>>  			Seattle, WA 98105-6099
>> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
>> http://staff.washington.edu/~dbeyer
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>>



More information about the Bioconductor mailing list