[BioC] org.Hs.eg question

Sim, Fraser Fraser_Sim at URMC.Rochester.edu
Fri Mar 20 20:31:18 CET 2009


Hi,

Here are the results:
 665 of 898 rat GeneIDs annotated with human GeneIDs using hom.Rn.inp
package
 752 of 898 using biomaRt package (slow)
 784 of 898 using hom.Rn.inp then biomaRt approaches (very slow)

Looks like the combined approach is best. This method annotates 87% of
rat geneIds which is good enough for my purposes.

Thanks,
Fraser

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Sim, Fraser
Sent: Friday, March 20, 2009 12:07 PM
To: Marc Carlson
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] org.Hs.eg question

Hi Marc,

I tried the bioaRt package to look at the ensembl data directly.

NCBI GeneID = 4340 - maps to ENSG00000204655
Ensembl Peptide = ENSP00000373017 - maps to ENSG00000206456 (no NCBI
GeneID associated)

So I think the mapping is working correctly as no NCBI GeneID is
associated with this Ensembl gene.

However, both have very similar annotations and appear to be the same
'MOG' gene but only one gets the NCBI GeneID. 

ENSG00000204655 maps to Chromosome 6: 29,732,788-29,748,128
ENSG00000206456 maps to Chromosome c6_COX: 29,768,660-29,784,001

I was using the org.Hs.eg.db package with hom.Rn.inp.db to find
rat-human homologs. Actually if I use biomaRt to look for the homolog of
rat geneId 24558, it successfully finds ENSG00000204655 (ie. GeneID
4340). I'll try that route and report my results.

Cheers,
Fraser

-----Original Message-----
From: Marc Carlson [mailto:mcarlson at fhcrc.org] 
Sent: Thursday, March 19, 2009 3:16 PM
To: Sim, Fraser
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] org.Hs.eg question

Hi Sim,

I can explain this, and maybe you can even help me to improve things. 
The mappings for ensembl protein and transcript IDs are available mapped
to ensembl gene IDs from ensembls web site (as mapped to ensembl gene
IDs).  And the mappings from entrez gene to ensembl gene IDs presently
come from NCBI.

However, the gene to gene mappings from NCBI do not seem to be as
complete as whatever ensembl is using, and I do not have any explanation
from them about why that is.  I also don't have a better source for this
information (yet) as I have been unable to locate this kind of
information from ensembls FTP sites.  Something must exist somewhere at
ensembl though because the ensembl web site is presumably based on it. 
But whatever they are using at ensembl they do not seem to be sharing
that mapping with the world (although it would be great to find out that
I had just missed it somehow).  If you know where I can find a better
source for this kind of information than what I am currently using, I
would be more than happy to consider it.  But it obviously has to be
from a trustworthy and documentable source (such as NCBI or ensembl). 
Otherwise there would not be much point in including it.  ;)


  Marc




Sim, Fraser wrote:
> Hi,
>
> I'm using the org.Hs.eg annotation package to convert Ensembl protein
> annotations to Entrez GeneIds. I don't understand why although I can
> find the correct annotation manually via the Ensembl website (EG =
> 4340), the annotation package is unable to. 
>
> Here is the code:
>   
>> HsENSP
>>     
> [1] "ENSP00000373017"
>   
>> require("org.Hs.eg.db")
>> HsEG = as.character(unlist(mget(HsENSP, org.Hs.egENSEMBLPROT2EG,
>>     
> ifnotfound = NA)))
>   
>> HsEG
>>     
> [1] NA
>
> Thanks for any input.
>
> Regards,
> Fraser
>
>
>   
>> sessionInfo()
>>     
> R version 2.8.1 (2008-12-22) 
> i386-pc-mingw32 
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets
methods
>
> [8] base     
>
> other attached packages:
>  [1] gplots_2.6.0        gdata_2.4.2         gtools_2.5.0       
>  [4] bioDist_1.14.0      RColorBrewer_1.0-2  GEOquery_2.6.0     
>  [7] RCurl_0.94-0        rae230a.db_2.2.5    org.Rn.eg.db_2.2.6 
> [10] hom.Rn.inp.db_2.2.5 org.Hs.eg.db_2.2.6  RSQLite_0.7-1      
> [13] DBI_0.2-4           AnnotationDbi_1.4.2 Biobase_2.2.1      
> [16] rcom_2.0-4          rscproxy_1.0-12    
>   
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>   

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list