[BioC] Annotation problem through org.Mm.eg.db

Marc Carlson mcarlson at fhcrc.org
Wed Mar 20 21:10:43 CET 2013


Hi Himanshu,

The org.Mm.eg.db package has records for 58021 entrez gene IDs.  How do 
I know that?  Well:

k = keys(org.Mm.eg.db, keytype="ENTREZID")
length(k)

So how many of those have a gene symbol attached?  Well it looks like 
they basically all map to something:
res = select(org.Mm.eg.db, keys=k, cols="SYMBOL", keytype="ENTREZID")
dim(res)

Although these are still gene symbols, and as such, they are not 
guaranteed to be unique.  So it's not surprising if some of them are 
shared by different genes...  :(
length(res[["SYMBOL"]])
length(unique(res[["SYMBOL"]]))

But not too many actually.  Only 284 in fact.  So this all raises 
another question.  Specifically: what is going on with your ids?  Why 
are so many of them not matching up with any sort of symbol?  My best 
guess is that some of them are not really mouse entrez gene ids.  So 
what happens if you take your list of ids and do this:

table(ids %in% k)

And are you sure that your ids are really supposed to be entrez gene IDs?


   Marc




On 03/19/2013 08:18 AM, Himanshu Sharma wrote:
> Dear Mailing list,
> I have mouse gene entrez ids after RNAseq analysis from RSEM and edgeR. I have 13354 gene ids and I am trying to get the gene symbol for the same. I have been doing the following :
>
> symbol<- select(org.Mm.eg.db, keys=ids, keytype="ENTREZID", cols="SYMBOL")
>   where ids contains the list of 13354 gene ids
>
> But when I see the result, I get half or less than half symbols for gene ids.
> Is there a better way to map these ids to gene symbols?.
>
> Thanks in advance,
> Himanshu
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list