[BioC] Retrieve Entrez IDs for enriched GO terms

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Thu Sep 24 12:51:37 CEST 2009


these databases are built using different methods so you get different results. A common problem in bioinformatics! Solution. Go to the ncbi ftp site, for the entrez gene database and download the gene2go.gz file. Unzip and query.
________________________________________
From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Yuan Hao [yuan.hao at ucd.ie]
Sent: 24 September 2009 11:46
To: Heidi Dvinge
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Retrieve Entrez IDs for enriched GO terms

Hi Heidi,

Thank you very much for your reply. The method you provided gives the
same result as using probeSetSummary, but I don't understand why this
result is different from the one got from biomaRt? Do you have some
insight about it?

Kind regards,
Yuan

On 24 Sep 2009, at 11:38, Heidi Dvinge wrote:

> Hello Yuan,
>
> have you tried using the accessor functions for your test object
> directly? For example:
>
> > geneIdsByCategory(hyp, catids="GO:0007498")
>
> Does this give you what you want?
>
> Cheers
> \Heidi
>
>
> On 24 Sep 2009, at 11:31, Yuan Hao wrote:
>
>> Dear list,
>>
>> I spent a long time trying to figure out this problem, but without
>> progress. I would appreciate it very much if you could give me some
>> help.
>>
>> I got a list of differential expressed genes from microarray analysis
>> by using limma. Then I did GO enrichment analysis on these genes by
>> hypeGTest() method available in GOstats package. Now I want to
>> retrieve entrez gene IDs in my gene list that correspond to each
>> enriched GO terms. I found there are two ways to get the entrez gene
>> IDs: using probeSetSummary() from GOstats, or using getBM() from
>> biomaRt. I tried both method, and they all worked, but I got two
>> different lists (lengths 13 vs 24) of entrez gene IDs corresponding
>> to
>> a single GO term, and most of them are not overlapped. I am not very
>> familiar with the annotation and/or genome assembly, so I am not sure
>> whether it is because the two methods using different annotation/
>> assembly that caused this problem.
>>
>> # get geneIds for hyperGTest
>>
>>> topA<-topTable(fit2,coef=1,p.value=0.01,n=nrow(fit2))
>>
>>> prbs<-topA[,1]
>>
>>> hasGO<-sapply(mget(prbs,hgu133plus2GO),function(ids)
>>
>> + if(!is.na(ids) && length(ids) > 1) TRUE else FALSE)
>>
>>> prbs<-prbs[hasGO]
>>
>>> prbs<-getEG(prbs,"hgu133plus2")
>>
>>> prbs<-prbs[!duplicated(prbs)]
>>
>> # get universeGeneIds for hyperGTest
>>
>>> univ<-featureNames(eset)
>>
>>> hasUnivGO<-sapply(mget(univ,hgu133plus2GO),function(ids)
>>
>> + if (!is.na(ids) && length(ids) > 1) TRUE else FALSE)
>>
>>> univ<-univ[hasUnivGO]
>>
>>> univ<-unique(getEG(univ,"hgu133plus2"))
>>
>> # compose params and carry out hyperGTest
>>
>>> p<-new("GOHyperGParams", geneIds=prbs, universeGeneIds=univ,
>> ontology="BP", annotation="hgu133plus2", conditional=TRUE)
>>
>>> if(interactive()){
>>
>> + hyp<-hyperGTest(p)
>>
>> + ps<-probeSetSummary(hyp)
>>
>> }
>>
>> # retrieve entrez IDs for one enriched GO term GO:0007498
>>
>>> unique(ps$"GO:0007498"$EntrezID)
>>
>>   [1] "2131"  "2139"  "2296"  "3717"  "4088"  "4771"  "6398"  "655"
>> "695"
>>
>> [10] "8013"  "8320"  "83439" "9314"
>>
>>
>>
>> # using biomaRt package
>>
>>> ensembl=useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>
>>> summary <- summary(hyp)
>>
>>> goID<-summary$GOBPID
>>
>>> E <- getBM(attributes=c("go_biological_process_id", "entrezgene"),
>> filters="go", values=goID, mart=ensembl)
>>
>>> oneGO<-sapply(E$"go_biological_process_id",function(i)
>>
>> + if (i=="GO:0007498") TRUE else FALSE)
>>
>>> EE<-E[oneGO,]
>>
>> # retrieve entrez IDs for the same GO term, GO:0007498
>>
>>> unique(EE$entrezgene)
>>
>>   [1]  5515    NA    90  6398  2131  3717   660  4145 84667  3055
>> 6911 10320
>>
>> [13] 10220 22806   695  5017 23184  9355  2303  7075  4232    92
>> 6943  6862
>>
>>
>> Thank you very much in advance!
>>
>> Kind regards,
>> Yuan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>      [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


        [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list