[BioC] org.Hs.eg question
Fraser_Sim at URMC.Rochester.edu
Fri Mar 20 17:06:33 CET 2009
I tried the bioaRt package to look at the ensembl data directly.
NCBI GeneID = 4340 - maps to ENSG00000204655
Ensembl Peptide = ENSP00000373017 - maps to ENSG00000206456 (no NCBI
So I think the mapping is working correctly as no NCBI GeneID is
associated with this Ensembl gene.
However, both have very similar annotations and appear to be the same
'MOG' gene but only one gets the NCBI GeneID.
ENSG00000204655 maps to Chromosome 6: 29,732,788-29,748,128
ENSG00000206456 maps to Chromosome c6_COX: 29,768,660-29,784,001
I was using the org.Hs.eg.db package with hom.Rn.inp.db to find
rat-human homologs. Actually if I use biomaRt to look for the homolog of
rat geneId 24558, it successfully finds ENSG00000204655 (ie. GeneID
4340). I'll try that route and report my results.
From: Marc Carlson [mailto:mcarlson at fhcrc.org]
Sent: Thursday, March 19, 2009 3:16 PM
To: Sim, Fraser
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] org.Hs.eg question
I can explain this, and maybe you can even help me to improve things.
The mappings for ensembl protein and transcript IDs are available mapped
to ensembl gene IDs from ensembls web site (as mapped to ensembl gene
IDs). And the mappings from entrez gene to ensembl gene IDs presently
come from NCBI.
However, the gene to gene mappings from NCBI do not seem to be as
complete as whatever ensembl is using, and I do not have any explanation
from them about why that is. I also don't have a better source for this
information (yet) as I have been unable to locate this kind of
information from ensembls FTP sites. Something must exist somewhere at
ensembl though because the ensembl web site is presumably based on it.
But whatever they are using at ensembl they do not seem to be sharing
that mapping with the world (although it would be great to find out that
I had just missed it somehow). If you know where I can find a better
source for this kind of information than what I am currently using, I
would be more than happy to consider it. But it obviously has to be
from a trustworthy and documentable source (such as NCBI or ensembl).
Otherwise there would not be much point in including it. ;)
Sim, Fraser wrote:
> I'm using the org.Hs.eg annotation package to convert Ensembl protein
> annotations to Entrez GeneIds. I don't understand why although I can
> find the correct annotation manually via the Ensembl website (EG =
> 4340), the annotation package is unable to.
> Here is the code:
>  "ENSP00000373017"
>> HsEG = as.character(unlist(mget(HsENSP, org.Hs.egENSEMBLPROT2EG,
> ifnotfound = NA)))
>  NA
> Thanks for any input.
> R version 2.8.1 (2008-12-22)
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> attached base packages:
>  tools stats graphics grDevices utils datasets
>  base
> other attached packages:
>  gplots_2.6.0 gdata_2.4.2 gtools_2.5.0
>  bioDist_1.14.0 RColorBrewer_1.0-2 GEOquery_2.6.0
>  RCurl_0.94-0 rae230a.db_2.2.5 org.Rn.eg.db_2.2.6
>  hom.Rn.inp.db_2.2.5 org.Hs.eg.db_2.2.6 RSQLite_0.7-1
>  DBI_0.2-4 AnnotationDbi_1.4.2 Biobase_2.2.1
>  rcom_2.0-4 rscproxy_1.0-12
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
More information about the Bioconductor