[BioC] Retrieve Entrez IDs for enriched GO terms

Thu Sep 24 13:08:43 CEST 2009

On Thu, Sep 24, 2009 at 6:51 AM, michael watson (IAH-C)
<michael.watson at bbsrc.ac.uk> wrote:
>
> these databases are built using different methods so you get different results. A common problem in bioinformatics! Solution. Go to the ncbi ftp site, for the entrez gene database and download the gene2go.gz file. Unzip and query.
> ________________________________________

As usual, there are many ways to solve the problem and downloading
tab-delimited text files is one.  However, the GO package in
bioconductor is built using the NCBI data (actually, the file noted
above), so there really isn't a need to download files.  The data are
already present in the GO.db package and in all chip annotation
packages built by Bioconductor.

Sean

> From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Yuan Hao [yuan.hao at ucd.ie]
> Sent: 24 September 2009 11:46
> To: Heidi Dvinge
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Retrieve Entrez IDs for enriched GO terms
>
> Hi Heidi,
>
> Thank you very much for your reply. The method you provided gives the
> same result as using probeSetSummary, but I don't understand why this
> result is different from the one got from biomaRt? Do you have some
> insight about it?
>
> Kind regards,
> Yuan
>
> On 24 Sep 2009, at 11:38, Heidi Dvinge wrote:
>
>> Hello Yuan,
>>
>> have you tried using the accessor functions for your test object
>> directly? For example:
>>
>> > geneIdsByCategory(hyp, catids="GO:0007498")
>>
>> Does this give you what you want?
>>
>> Cheers
>> \Heidi
>>
>>
>> On 24 Sep 2009, at 11:31, Yuan Hao wrote:
>>
>>> Dear list,
>>>
>>> I spent a long time trying to figure out this problem, but without
>>> progress. I would appreciate it very much if you could give me some
>>> help.
>>>
>>> I got a list of differential expressed genes from microarray analysis
>>> by using limma. Then I did GO enrichment analysis on these genes by
>>> hypeGTest() method available in GOstats package. Now I want to
>>> retrieve entrez gene IDs in my gene list that correspond to each
>>> enriched GO terms. I found there are two ways to get the entrez gene
>>> IDs: using probeSetSummary() from GOstats, or using getBM() from
>>> biomaRt. I tried both method, and they all worked, but I got two
>>> different lists (lengths 13 vs 24) of entrez gene IDs corresponding
>>> to
>>> a single GO term, and most of them are not overlapped. I am not very
>>> familiar with the annotation and/or genome assembly, so I am not sure
>>> whether it is because the two methods using different annotation/
>>> assembly that caused this problem.
>>>
>>> # get geneIds for hyperGTest
>>>
>>>> topA<-topTable(fit2,coef=1,p.value=0.01,n=nrow(fit2))
>>>
>>>> prbs<-topA[,1]
>>>
>>>> hasGO<-sapply(mget(prbs,hgu133plus2GO),function(ids)
>>>
>>> + if(!is.na(ids) && length(ids) > 1) TRUE else FALSE)
>>>
>>>> prbs<-prbs[hasGO]
>>>
>>>> prbs<-getEG(prbs,"hgu133plus2")
>>>
>>>> prbs<-prbs[!duplicated(prbs)]
>>>
>>> # get universeGeneIds for hyperGTest
>>>
>>>> univ<-featureNames(eset)
>>>
>>>> hasUnivGO<-sapply(mget(univ,hgu133plus2GO),function(ids)
>>>
>>> + if (!is.na(ids) && length(ids) > 1) TRUE else FALSE)
>>>
>>>> univ<-univ[hasUnivGO]
>>>
>>>> univ<-unique(getEG(univ,"hgu133plus2"))
>>>
>>> # compose params and carry out hyperGTest
>>>
>>>> p<-new("GOHyperGParams", geneIds=prbs, universeGeneIds=univ,
>>> ontology="BP", annotation="hgu133plus2", conditional=TRUE)
>>>
>>>> if(interactive()){
>>>
>>> + hyp<-hyperGTest(p)
>>>
>>> + ps<-probeSetSummary(hyp)
>>>
>>> }
>>>
>>> # retrieve entrez IDs for one enriched GO term GO:0007498
>>>
>>>> unique(ps$"GO:0007498"$EntrezID)
>>>
>>>   [1] "2131"  "2139"  "2296"  "3717"  "4088"  "4771"  "6398"  "655"
>>> "695"
>>>
>>> [10] "8013"  "8320"  "83439" "9314"
>>>
>>>
>>>
>>> # using biomaRt package
>>>
>>>> ensembl=useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>>
>>>> summary <- summary(hyp)
>>>
>>>> goID<-summary$GOBPID
>>>
>>>> E <- getBM(attributes=c("go_biological_process_id", "entrezgene"),
>>> filters="go", values=goID, mart=ensembl)
>>>
>>>> oneGO<-sapply(E$"go_biological_process_id",function(i)
>>>
>>> + if (i=="GO:0007498") TRUE else FALSE)
>>>
>>>> EE<-E[oneGO,]
>>>
>>> # retrieve entrez IDs for the same GO term, GO:0007498
>>>
>>>> unique(EE$entrezgene)
>>>
>>>   [1]  5515    NA    90  6398  2131  3717   660  4145 84667  3055
>>> 6911 10320
>>>
>>> [13] 10220 22806   695  5017 23184  9355  2303  7075  4232    92
>>> 6943  6862
>>>
>>>
>>> Thank you very much in advance!
>>>
>>> Kind regards,
>>> Yuan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>