[BioC] How to get gene symbol after deseq?

Fabrice Tourre fabrice.ciup at gmail.com
Sat Feb 8 00:24:31 CET 2014


Jim,

I see. Thank you very much.

On Fri, Feb 7, 2014 at 5:50 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Fabrice,
>
> I don't know. It might have something to do with that gene being a predicted
> gene. Perhaps we don't annotate such things? Or it may have been placed in
> Ensembl between now and when we build the org.Mm.eg.db package.
>
> If you require the most recent data, you can always build your own package
> using  makeOrgPackageFromNCBI() in the AnnotationForge package, or use
> biomaRt.
>
> Best,
>
> Jim
>
>
>
>
> On Friday, February 07, 2014 5:41:13 PM, Fabrice Tourre wrote:
>>
>> Jim,
>>
>> Thank you very much. It makes sense to me.
>>
>> One small question, why ENSMUSG00000082538 is given NA. but it is
>> given a symbol Gm14704 on ENSEMBL.
>>
>>
>> http://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000082538;r=X:70403181-70404054;t=ENSMUST00000120300
>>
>>
>>
>> On Fri, Feb 7, 2014 at 4:52 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>>>
>>> Hi Fabrice,
>>>
>>>
>>>
>>> On 2/7/2014 4:08 PM, Fabrice Tourre wrote:
>>>>
>>>>
>>>> Dear experts,
>>>>
>>>> After I have run deseq, I got a list of genes. They are something like
>>>> as follow.
>>>>
>>>>> resSig[,1]
>>>>
>>>>
>>>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>>>> [2] "ENSMUSG00000026727:016"
>>>> [3] "ENSMUSG00000026727:004"
>>>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>>>> [5] "ENSMUSG00000025730:010"
>>>> [6] "ENSMUSG00000005836:007"
>>>> [7] "ENSMUSG00000073139:001"
>>>>
>>>> How can I get back the gene symbol for each ID?
>>>
>>>
>>>
>>>> gns
>>>
>>>
>>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>>> [2] "ENSMUSG00000026727:016"
>>> [3] "ENSMUSG00000026727:004"
>>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>>> [5] "ENSMUSG00000025730:010"
>>> [6] "ENSMUSG00000005836:007"
>>> [7] "ENSMUSG00000073139:001"
>>>
>>>> gns2 <- sapply(strsplit(gns, "\\+|:"), "[", 1)
>>>> gns2
>>>
>>> [1] "ENSMUSG00000022150" "ENSMUSG00000026727" "ENSMUSG00000026727"
>>> [4] "ENSMUSG00000022150" "ENSMUSG00000025730" "ENSMUSG00000005836"
>>> [7] "ENSMUSG00000073139"
>>>>
>>>> select(Mus.musculus, gns2, "SYMBOL", "ENSEMBL")
>>>
>>>               ENSEMBL   SYMBOL
>>> 1 ENSMUSG00000022150     Dab2
>>> 2 ENSMUSG00000026727     Rsu1
>>> 5 ENSMUSG00000025730   Rab40c
>>> 6 ENSMUSG00000005836    Gata6
>>> 7 ENSMUSG00000073139 BC023829
>>>
>>> Note that I am discarding the second Ensembl Gene ID. You could do
>>> something
>>> more sophisticated to capture duplicated IDs, but I'll leave that for you
>>> to
>>> figure out.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>>
>>>> Thank you very much in advance.
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099



More information about the Bioconductor mailing list