[BioC] fastest way to get a gene list having certain GO term

Duke duke.lists at gmx.com
Tue Mar 6 23:24:46 CET 2012


On 3/6/12 3:47 PM, Duke wrote:
> Hi folks,
>
> I need some statistics for a certain GO term (for example, "DNA 
> binding"), and I wonder what is the fastest way to archive the latest 
> list of genes having that specific GO term. There are now a number of 
> GO packages and I would like to hear/learn your experience regarding 
> various different packages.

To archive the above task, I separated it as two processes:

* Get all GO IDs having specific term ("DNA binding")
* Then, get all the genes having the resulting GO IDs

I think I got the numbers now:

library("GO.db")
library("org.Hs.eg.db")

GOTerm2GOID = function(term){
   GTL = eapply(GOTERM, function(x){grep(term, x at Term, value=TRUE)})
   GID = sapply(GTL, length)
   names(GTL[GID > 0])
}

length(unlist(sapply(GOTerm2GOID("DNA binding"), function(x) mget(x, 
revmap(org.Hs.egGO), ifnotfound=NA))))
4265

However, I am still stuck at how to get the gene symbols (Il22, Foxp3 
for example) as well as RefSeq ID of the resulting gene list.

Anybody has any suggestion?

Thanks,

D.



More information about the Bioconductor mailing list