[BioC] retrieving external Gene IDs from TranscriptDB Object

Ugo Borello ugo.borello at inserm.fr
Mon May 6 12:18:50 CEST 2013


Dear Stefanie,
I just learned, thanks to Marc Carlson, an easy way to do what you want.
It is nicely described in this vignette, section 05 (and 03):
http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationDbi/in
st/doc/IntroToAnnotationPackages.pdf
I hope this help.
Ugo


> From: Stefanie Tauber <stefanie.tauber at univie.ac.at>
> Date: Mon, 6 May 2013 11:25:09 +0200
> To: <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] retrieving external Gene IDs from TranscriptDB Object
> 
> Dear List,
> 
> I have created a TranscriptDB for yeast as follows:
> 
> library(GenomicFeatures)
> library(biomaRt)
> 
> ## create yeast DB
> myDB <- makeTranscriptDbFromBiomart(biomart = "ensembl", dataset =
> "scerevisiae_gene_ensembl", circ_seqs = c(DEFAULT_CIRC_SEQS, "Mito"))
> myDBx <- cdsBy(myDB,by = "tx",use.names = TRUE)
> 
> Now, I would like to retrieve the external gene ids.
> Is this the most generic way?
> 
> # select mart and dataset
> mymart = useMart("ENSEMBL_MART_ENSEMBL", dataset =
> "scerevisiae_gene_ensembl", host="www.ensembl.org")
> 
> # just a selection of transcripts
> 
> sel = names(myDBx)[5:6]
> 
> getBM(attributes=c("ensembl_transcript_id","external_gene_id"), values =
> sel, filters = "ensembl_transcript_id", mart = mymart)
> 
> 
> And, when creating a TranscriptDB From UCSC:
> 
> 
> myDB1 <- makeTranscriptDbFromUCSC(genome = "hg19",tablename = "knownGene")
> myDBx1 <- cdsBy(myDB1,by = "tx",use.names =TRUE)
> 
> What would be here the most generic way to retrieve the external gene IDs
> for each transcript ID?
> 
> Best,
> Stefanie
> 
> 
>> sessionInfo()
> R Under development (unstable) (2013-05-02 r62711)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] biomaRt_2.16.0         GenomicFeatures_1.12.1 AnnotationDbi_1.22.3
> [4] Biobase_2.20.0         GenomicRanges_1.12.2   IRanges_1.18.0
> [7] BiocGenerics_0.6.0
> 
> loaded via a namespace (and not attached):
>  [1] Biostrings_2.28.0  bitops_1.0-5       BSgenome_1.28.0    DBI_0.2-6
>  [5] RCurl_1.95-4.1     Rsamtools_1.12.2   RSQLite_0.11.3
> rtracklayer_1.20.1
>  [9] stats4_3.1.0       tools_3.1.0        XML_3.96-1.1       zlibbioc_1.6.0
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list