[BioC] org.Hs.eg "gene_info" table seems to be worng

Marc Carlson mcarlson at fhcrc.org
Tue Dec 8 01:44:05 CET 2009


Hi Vladimir,

The _id value in the gene_info table is NOT meant to be an entrez gene
ID.  It is an internal ID ONLY.  If a few genes have the same _id as
their entrez gene ID, that is only a bizarre coincidence and has no
actual meaning whatsoever.  In order to connect the values in the
gene_info table to an actual entrez gene ID, you need to join the
gene_info table to the genes table by using the internal _id.  That is
what _id is for, just an integer for joining the various internal tables
to each other.  The schema for this packages is available by using
org.Hs.eg_dbschema().  And it might be also be a good idea to read the
AnnotationDbi vignette which can be found here:

http://www.bioconductor.org/packages/release/bioc/html/AnnotationDbi.html

Let me know if you have more questions,


  Marc




Vladimir Morozov wrote:
>  Hi,
>
>  org.Hs.eg "gene_info" table seems to be worng
>
>
>   
>>  conn <- org.Hs.eg_dbconn()
>>     
>
> #the example seems correct
>   
>> dbGetQuery(conn, "SELECT * FROM gene_info LIMIT 3;")
>>     
>   _id                        gene_name symbol
> 1   1           alpha-1-B glycoprotein   A1BG
> 2   2            alpha-2-macroglobulin    A2M
> 3   3 alpha-2-macroglobulin pseudogene   A2MP
>
> #wrong for other genes...
>   
>> dbGetQuery(conn, "SELECT * FROM gene_info where _id=367 or _id=11835;")
>>     
>
> dbGetQuery(conn, "SELECT * FROM gene_info where _id=367 or _id=11835;")
>     _id                        gene_name symbol
> 1   367         ADP-ribosyltransferase 1   ART1
> 2 11835 phosphoserine aminotransferase 1  PSAT1
>   
>
>
>
> "111835" is mouse AR gene
>
> Individual maps look Ok
>   
>> org.Hs.egSYMBOL[["367"]]
>>     
> [1] "AR"
>
>   
>> org.Hs.egSYMBOL[["11835"]]
>>     
> NULL
>
>   
>> org.Hs.egGENENAME[["367"]]
>>     
> [1] "androgen receptor"
>   
>
>
>   
>> org.Hs.eg()
>>     
>
> org.Hs.eg()
> Quality control information for org.Hs.eg:
>
>
> This package has the following mappings:
>
> org.Hs.egACCNUM has 29687 mapped keys (of 40784 keys)
> org.Hs.egACCNUM2EG has 590454 mapped keys (of 590454 keys)
> org.Hs.egALIAS2EG has 102986 mapped keys (of 102986 keys)
> org.Hs.egCHR has 40539 mapped keys (of 40784 keys)
> org.Hs.egCHRLENGTHS has 25 mapped keys (of 25 keys)
> org.Hs.egCHRLOC has 20599 mapped keys (of 40784 keys)
> org.Hs.egCHRLOCEND has 20599 mapped keys (of 40784 keys)
> org.Hs.egENSEMBL has 20255 mapped keys (of 40784 keys)
> org.Hs.egENSEMBL2EG has 19903 mapped keys (of 19903 keys)
> org.Hs.egENSEMBLPROT has 19927 mapped keys (of 40784 keys)
> org.Hs.egENSEMBLPROT2EG has 44871 mapped keys (of 44871 keys)
> org.Hs.egENSEMBLTRANS has 19965 mapped keys (of 40784 keys)
> org.Hs.egENSEMBLTRANS2EG has 44931 mapped keys (of 44931 keys)
> org.Hs.egENZYME has 2015 mapped keys (of 40784 keys)
> org.Hs.egENZYME2EG has 870 mapped keys (of 870 keys)
> org.Hs.egGENENAME has 40784 mapped keys (of 40784 keys)
> org.Hs.egGO has 17482 mapped keys (of 40784 keys)
> org.Hs.egGO2ALLEGS has 10438 mapped keys (of 10438 keys)
> org.Hs.egGO2EG has 7659 mapped keys (of 7659 keys)
> org.Hs.egMAP has 36549 mapped keys (of 40784 keys)
> org.Hs.egMAP2EG has 2946 mapped keys (of 2946 keys)
> org.Hs.egOMIM has 14080 mapped keys (of 40784 keys)
> org.Hs.egOMIM2EG has 16415 mapped keys (of 16415 keys)
> org.Hs.egPATH has 4799 mapped keys (of 40784 keys)
> org.Hs.egPATH2EG has 205 mapped keys (of 205 keys)
> org.Hs.egPFAM has 24009 mapped keys (of 40784 keys)
> org.Hs.egPMID has 28206 mapped keys (of 40784 keys)
> org.Hs.egPMID2EG has 232955 mapped keys (of 232955 keys)
> org.Hs.egPROSITE has 24009 mapped keys (of 40784 keys)
> org.Hs.egREFSEQ has 28158 mapped keys (of 40784 keys)
> org.Hs.egREFSEQ2EG has 90796 mapped keys (of 90796 keys)
> org.Hs.egSYMBOL has 40784 mapped keys (of 40784 keys)
> org.Hs.egSYMBOL2EG has 40763 mapped keys (of 40763 keys)
> org.Hs.egUNIGENE has 24864 mapped keys (of 40784 keys)
> org.Hs.egUNIGENE2EG has 25562 mapped keys (of 25562 keys)
> org.Hs.egUNIPROT has 20652 mapped keys (of 40784 keys)
>
>
> Additional Information about this package:
>
> DB schema: HUMAN_DB
> DB schema version: 1.0
> Organism: Homo sapiens
> Date for NCBI data: 2009-Mar11
> Date for GO data: 200903
> Date for KEGG data: 2009-Mar10
> Date for Golden Path data: 2008-Sep3
> Date for IPI data: 2009-Mar03
> Date for Ensembl data: 2009-Mar6
>
>
>
> ***************************************************************************************
> The information contained in this electronic message is ...{{dropped:18}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list