[BioC] known gene => gene symbol for UCSC

James W. MacDonald jmacdon at uw.edu
Tue Apr 2 17:01:58 CEST 2013


Hi Ido,

On 4/2/2013 10:20 AM, Ido Tamir wrote:
> Dear James,
>
> thank you very much. It did not work with:
> a fresh session (a) and an old session (b)
>
> Your packages seem newer, but my ids are from the old package, so they
> should be consistent.

Fair enough. However, the new release is upon us and this has evidently 
been fixed in the new version. So if I were you, I would upgrade to 
R-3.0.0 and BioC-2.12

Best,

Jim


> I just installed the bioconductor packages Mus.musculus and  TxDb.Mmusculus.UCSC.mm10.ensGene today.
>
> best,
> ido
>
> a) fresh session
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C                 LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Mus.musculus_1.0.0
> [2] TxDb.Mmusculus.UCSC.mm10.ensGene_2.8.0
> [3] org.Mm.eg.db_2.8.0
> [4] GO.db_2.8.0
> [5] RSQLite_0.11.2
> [6] DBI_0.2-5
> [7] OrganismDbi_1.0.3
> [8] GenomicFeatures_1.10.2
> [9] GenomicRanges_1.10.7
> [10] IRanges_1.16.6
> [11] AnnotationDbi_1.20.6
> [12] Biobase_2.18.0
> [13] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.14.0     Biostrings_2.26.3  bitops_1.0-5       BSgenome_1.26.1
> [5] graph_1.36.2       parallel_2.15.1    RBGL_1.34.0        RCurl_1.95-4.1
> [9] Rsamtools_1.10.2   rtracklayer_1.18.2 stats4_2.15.1      tools_2.15.1
> [13] XML_3.95-0.2       zlibbioc_1.4.0
>
> b) my old session:
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C                 LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] TxDb.Mmusculus.UCSC.mm10.knownGene_2.8.0
> [2] BiocInstaller_1.8.3
> [3] Mus.musculus_1.0.0
> [4] TxDb.Mmusculus.UCSC.mm10.ensGene_2.8.0
> [5] org.Mm.eg.db_2.8.0
> [6] GO.db_2.8.0
> [7] RSQLite_0.11.2
> [8] DBI_0.2-5
> [9] OrganismDbi_1.0.3
> [10] TxDb.Mmusculus.UCSC.mm9.knownGene_2.8.0
> [11] GenomicFeatures_1.10.2
> [12] AnnotationDbi_1.20.6
> [13] Biobase_2.18.0
> [14] Rsamtools_1.10.2
> [15] Biostrings_2.26.3
> [16] TransView_1.0.7
> [17] Repitools_1.4.2
> [18] GenomicRanges_1.10.7
> [19] IRanges_1.16.6
> [20] BiocGenerics_0.4.0
> [21] ggbio_1.6.6
> [22] ggplot2_0.9.3.1
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.14.0           biovizBase_1.6.2         bitops_1.0-5
> [4] BSgenome_1.26.1          cluster_1.14.3           colorspace_1.2-1
> [7] dichromat_2.0-0          digest_0.6.3             edgeR_3.0.8
> [10] gdata_2.12.0             gplots_2.11.0            graph_1.36.2
> [13] grid_2.15.1              gridExtra_0.9.1          gtable_0.1.2
> [16] gtools_2.7.0             Hmisc_3.10-1             labeling_0.1
> [19] lattice_0.20-13          limma_3.14.4             MASS_7.3-23
> [22] munsell_0.4              parallel_2.15.1          plyr_1.8
> [25] proto_0.3-10             RBGL_1.34.0              RColorBrewer_1.0-5
> [28] RCurl_1.95-4.1           reshape2_1.2.2           rtracklayer_1.18.2
> [31] scales_0.2.3             stats4_2.15.1            stringr_0.6.2
> [34] tools_2.15.1             VariantAnnotation_1.4.12 XML_3.95-0.2
> [37] zlibbioc_1.4.0
>
>
> On Apr 2, 2013, at 3:49 PM, James W. MacDonald wrote:
>
>> Hi Ido,
>>
>> You don't give sessionInfo() results, but this works for me
>>
>>> select(Mus.musculus, "uc009veu.1", "SYMBOL","TXNAME")
>>      TXNAME SYMBOL
>> 1 uc009veu.1  Zglp1
>>
>>> sessionInfo()
>> R Under development (unstable) (2013-01-22 r61734)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C                 LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] Mus.musculus_1.1.0
>> [2] TxDb.Mmusculus.UCSC.mm10.knownGene_2.9.0
>> [3] org.Mm.eg.db_2.9.0
>> [4] GO.db_2.9.0
>> [5] RSQLite_0.11.2
>> [6] DBI_0.2-5
>> [7] OrganismDbi_1.1.14
>> [8] GenomicFeatures_1.11.16
>> [9] GenomicRanges_1.11.44
>> [10] IRanges_1.17.42
>> [11] AnnotationDbi_1.21.16
>> [12] Biobase_2.19.3
>> [13] BiocGenerics_0.5.6
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.15.1      Biostrings_2.27.14  bitops_1.0-5
>> [4] BSgenome_1.27.1     graph_1.37.7        RBGL_1.35.0
>> [7] RCurl_1.95-4.1      Rsamtools_1.11.27   rtracklayer_1.19.11
>> [10] stats4_3.0.0        tools_3.0.0         XML_3.96-1.1
>> [13] zlibbioc_1.5.0
>>
>>
>>
>> On 4/2/2013 9:43 AM, Ido Tamir wrote:
>>> Hi,
>>> how is one supposed to go from ucsc known gene id to gene symbols.
>>>
>>>> cols(TxDb.Mmusculus.UCSC.mm9.knownGene)
>>> [1] "CDSID"      "CDSNAME"    "CDSCHROM"   "CDSSTRAND"  "CDSSTART"
>>> [6] "CDSEND"     "EXONID"     "EXONNAME"   "EXONCHROM"  "EXONSTRAND"
>>> [11] "EXONSTART"  "EXONEND"    "GENEID"     "TXID"       "EXONRANK"
>>> [16] "TXNAME"     "TXCHROM"    "TXSTRAND"   "TXSTART"    "TXEND"
>>>
>>> I don't see anything that would me allow to link this with e.g. Mus.musculus
>>>
>>>> select(txdb, keys=c(100009600), cols=cols(txdb) ,keytype="GENEID")
>>>      GENEID  CDSID CDSNAME CDSCHROM CDSSTRAND CDSSTART   CDSEND EXONID EXONNAME
>>> 1 100009600 112799<NA>       chr9         - 20871384 20871523 129355<NA>
>>> 2 100009600 112798<NA>       chr9         - 20870468 20870821 129354<NA>
>>> 3 100009600 112797<NA>       chr9         - 20867758 20867840 129353<NA>
>>> 4 100009600 112796<NA>       chr9         - 20867338 20867431 129352<NA>
>>> 5 100009600 112795<NA>       chr9         - 20867032 20867161 129351<NA>
>>>   EXONCHROM EXONSTRAND EXONSTART  EXONEND  TXID EXONRANK     TXNAME TXCHROM
>>> 1      chr9          -  20871384 20872369 28943        1 uc009veu.1    chr9
>>> 2      chr9          -  20870468 20870821 28943        2 uc009veu.1    chr9
>>> 3      chr9          -  20867758 20867840 28943        3 uc009veu.1    chr9
>>> 4      chr9          -  20867338 20867431 28943        4 uc009veu.1    chr9
>>> 5      chr9          -  20866837 20867161 28943        5 uc009veu.1    chr9
>>>   TXSTRAND  TXSTART    TXEND
>>> 1        - 20866837 20872369
>>> 2        - 20866837 20872369
>>> 3        - 20866837 20872369
>>> 4        - 20866837 20872369
>>> 5        - 20866837 20872369
>>>
>>>> cols(Mus.musculus)
>>> [1] "GOID"         "TERM"         "ONTOLOGY"     "DEFINITION"   "ENTREZID"
>>> [6] "PFAM"         "IPI"          "PROSITE"      "ACCNUM"       "ALIAS"
>>> [11] "CHR"          "CHRLOC"       "CHRLOCEND"    "ENZYME"       "PATH"
>>> [16] "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"      "ENSEMBL"
>>> [21] "ENSEMBLPROT"  "ENSEMBLTRANS" "GENENAME"     "UNIPROT"      "GO"
>>> [26] "EVIDENCE"     "GOALL"        "EVIDENCEALL"  "ONTOLOGYALL"  "MGI"
>>> [31] "CDSID"        "CDSNAME"      "CDSCHROM"     "CDSSTRAND"    "CDSSTART"
>>> [36] "CDSEND"       "EXONID"       "EXONNAME"     "EXONCHROM"    "EXONSTRAND"
>>> [41] "EXONSTART"    "EXONEND"      "GENEID"       "TXID"         "EXONRANK"
>>> [46] "TXNAME"       "TXCHROM"      "TXSTRAND"     "TXSTART"      "TXEND"
>>>
>>>
>>>> select(Mus.musculus,keys="uc009veu.1", cols=c("SYMBOL"), keytype="TXNAME")
>>> Error in .testIfKeysAreOfProposedKeytype(x, keys, keytype) :
>>>   None of the keys entered are valid keys for the keytype specified.
>>>
>>> thank you very much,
>>> ido
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list