[BioC] How to get gene symbol after deseq?

Steve Lianoglou lianoglou.steve at gene.com
Sat Feb 8 00:32:38 CET 2014


Hi,

Don't forget that you can always query the ensembl biomart directly
from within R using the biomaRt package:
http://bioconductor.org/packages/release/bioc/html/biomaRt.html

The user's guide has many examples:
http://bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/biomaRt.pdf

HTH,
-steve


On Fri, Feb 7, 2014 at 3:24 PM, Fabrice Tourre <fabrice.ciup at gmail.com> wrote:
> Jim,
>
> I see. Thank you very much.
>
> On Fri, Feb 7, 2014 at 5:50 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>> Hi Fabrice,
>>
>> I don't know. It might have something to do with that gene being a predicted
>> gene. Perhaps we don't annotate such things? Or it may have been placed in
>> Ensembl between now and when we build the org.Mm.eg.db package.
>>
>> If you require the most recent data, you can always build your own package
>> using  makeOrgPackageFromNCBI() in the AnnotationForge package, or use
>> biomaRt.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>
>> On Friday, February 07, 2014 5:41:13 PM, Fabrice Tourre wrote:
>>>
>>> Jim,
>>>
>>> Thank you very much. It makes sense to me.
>>>
>>> One small question, why ENSMUSG00000082538 is given NA. but it is
>>> given a symbol Gm14704 on ENSEMBL.
>>>
>>>
>>> http://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000082538;r=X:70403181-70404054;t=ENSMUST00000120300
>>>
>>>
>>>
>>> On Fri, Feb 7, 2014 at 4:52 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>>>>
>>>> Hi Fabrice,
>>>>
>>>>
>>>>
>>>> On 2/7/2014 4:08 PM, Fabrice Tourre wrote:
>>>>>
>>>>>
>>>>> Dear experts,
>>>>>
>>>>> After I have run deseq, I got a list of genes. They are something like
>>>>> as follow.
>>>>>
>>>>>> resSig[,1]
>>>>>
>>>>>
>>>>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>>>>> [2] "ENSMUSG00000026727:016"
>>>>> [3] "ENSMUSG00000026727:004"
>>>>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>>>>> [5] "ENSMUSG00000025730:010"
>>>>> [6] "ENSMUSG00000005836:007"
>>>>> [7] "ENSMUSG00000073139:001"
>>>>>
>>>>> How can I get back the gene symbol for each ID?
>>>>
>>>>
>>>>
>>>>> gns
>>>>
>>>>
>>>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>>>> [2] "ENSMUSG00000026727:016"
>>>> [3] "ENSMUSG00000026727:004"
>>>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>>>> [5] "ENSMUSG00000025730:010"
>>>> [6] "ENSMUSG00000005836:007"
>>>> [7] "ENSMUSG00000073139:001"
>>>>
>>>>> gns2 <- sapply(strsplit(gns, "\\+|:"), "[", 1)
>>>>> gns2
>>>>
>>>> [1] "ENSMUSG00000022150" "ENSMUSG00000026727" "ENSMUSG00000026727"
>>>> [4] "ENSMUSG00000022150" "ENSMUSG00000025730" "ENSMUSG00000005836"
>>>> [7] "ENSMUSG00000073139"
>>>>>
>>>>> select(Mus.musculus, gns2, "SYMBOL", "ENSEMBL")
>>>>
>>>>               ENSEMBL   SYMBOL
>>>> 1 ENSMUSG00000022150     Dab2
>>>> 2 ENSMUSG00000026727     Rsu1
>>>> 5 ENSMUSG00000025730   Rab40c
>>>> 6 ENSMUSG00000005836    Gata6
>>>> 7 ENSMUSG00000073139 BC023829
>>>>
>>>> Note that I am discarding the second Ensembl Gene ID. You could do
>>>> something
>>>> more sophisticated to capture duplicated IDs, but I'll leave that for you
>>>> to
>>>> figure out.
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>>
>>>>> Thank you very much in advance.
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>>
>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> University of Washington
>>>> Environmental and Occupational Health Sciences
>>>> 4225 Roosevelt Way NE, # 100
>>>> Seattle WA 98105-6099
>>>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



-- 
Steve Lianoglou
Computational Biologist
Genentech



More information about the Bioconductor mailing list