[BioC] deseq2: convert ensembl ID to gene symbol.

Vang Quy Le / Region Nordjylland vql at rn.dk
Fri Aug 8 10:00:30 CEST 2014


Hello Fabrice Tourre,
You can try this.

Regards
Vang

####R code for fetching gene names with biomaRt #####
require(biomaRt)
annot.table  <- data.frame() # This is your annotation table, it can't be NULL 
# Collect ensembl IDs from annot.table before converting to normal gene names.
ensembl_ids <- character() # Get this from may be from annot.table
# Prepare gene table with some simple caching to avoid stressing the Ensembl server by many repeated runs
genes.table = NULL
if (!file.exists("cache.genes.table")) {
    message("Retrieving genes table from Ensembl...")
    mart <- useMart("ensembl")
    #listDatasets(mart=mart)
    mart <- useDataset("hsapiens_gene_ensembl", mart = mart)
    genes.table <- getBM(filters= "ensembl_gene_id",
                         attributes= c("ensembl_gene_id", "external_gene_id", "description"), values= ensembl_ids, mart= mart)
    save(genes.table, file= "cache.genes.table")
} else {
    load("cache.genes.table")
    message("Reading gene annotation information from cache file: cache/cache.genes.table
            Remove the file if you want to force retrieving data from Ensembl")
}

# Merging two tables, syntax showed here are in full forms, by.x and by.y can be simplified 
annot.table <- merge(x = annot.table, y = genes.table,
                         by.x = "ensembl_id", by.y = "ensembl_gene_id", all.x = T, all.y = F )
# Do something with the table, i.e export to TSV

-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Fabrice Tourre
Sent: Friday, August 08, 2014 9:18 AM
To: Bioconductor mailing list
Subject: [BioC] deseq2: convert ensembl ID to gene symbol.

Dear expert,

I am using Deseq2 and Dexseq to analysis my RNA-seq data. I totally follow the vignette in these packages.

In the last output, for the function DEXSeqHTML or HTMLReport, I prefer to use gene symbol, not ensembl ID. For example, I prefer Muc2 to ENSMUSG00000095400. It is more human readable. How can I output gene symbol than Ensembl ID? But it seems that I cannot directly change the gtf file. Because several Ensembl ID match on gene symbol as follow:

ENSMUSG00000095400  Muc2
ENSMUSG00000094393  Muc2
ENSMUSG00000025515  Muc2

Thank you very much in advance.

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list