[BioC] How to get gene symbol after deseq?

Fabrice Tourre fabrice.ciup at gmail.com
Fri Feb 7 23:41:13 CET 2014


Jim,

Thank you very much. It makes sense to me.

One small question, why ENSMUSG00000082538 is given NA. but it is
given a symbol Gm14704 on ENSEMBL.

http://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000082538;r=X:70403181-70404054;t=ENSMUST00000120300



On Fri, Feb 7, 2014 at 4:52 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Fabrice,
>
>
>
> On 2/7/2014 4:08 PM, Fabrice Tourre wrote:
>>
>> Dear experts,
>>
>> After I have run deseq, I got a list of genes. They are something like
>> as follow.
>>
>>> resSig[,1]
>>
>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>> [2] "ENSMUSG00000026727:016"
>> [3] "ENSMUSG00000026727:004"
>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>> [5] "ENSMUSG00000025730:010"
>> [6] "ENSMUSG00000005836:007"
>> [7] "ENSMUSG00000073139:001"
>>
>> How can I get back the gene symbol for each ID?
>
>
>> gns
>
> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
> [2] "ENSMUSG00000026727:016"
> [3] "ENSMUSG00000026727:004"
> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
> [5] "ENSMUSG00000025730:010"
> [6] "ENSMUSG00000005836:007"
> [7] "ENSMUSG00000073139:001"
>
>> gns2 <- sapply(strsplit(gns, "\\+|:"), "[", 1)
>> gns2
> [1] "ENSMUSG00000022150" "ENSMUSG00000026727" "ENSMUSG00000026727"
> [4] "ENSMUSG00000022150" "ENSMUSG00000025730" "ENSMUSG00000005836"
> [7] "ENSMUSG00000073139"
>> select(Mus.musculus, gns2, "SYMBOL", "ENSEMBL")
>              ENSEMBL   SYMBOL
> 1 ENSMUSG00000022150     Dab2
> 2 ENSMUSG00000026727     Rsu1
> 5 ENSMUSG00000025730   Rab40c
> 6 ENSMUSG00000005836    Gata6
> 7 ENSMUSG00000073139 BC023829
>
> Note that I am discarding the second Ensembl Gene ID. You could do something
> more sophisticated to capture duplicated IDs, but I'll leave that for you to
> figure out.
>
> Best,
>
> Jim
>
>
>>
>> Thank you very much in advance.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>



More information about the Bioconductor mailing list