[BioC] How to get gene symbol after deseq?

Fabrice Tourre fabrice.ciup at gmail.com
Sat Feb 8 00:45:57 CET 2014


Great. Thank you.

On Fri, Feb 7, 2014 at 6:32 PM, Steve Lianoglou
<lianoglou.steve at gene.com> wrote:
> Hi,
>
> Don't forget that you can always query the ensembl biomart directly
> from within R using the biomaRt package:
> http://bioconductor.org/packages/release/bioc/html/biomaRt.html
>
> The user's guide has many examples:
> http://bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/biomaRt.pdf
>
> HTH,
> -steve
>
>
> On Fri, Feb 7, 2014 at 3:24 PM, Fabrice Tourre <fabrice.ciup at gmail.com> wrote:
>> Jim,
>>
>> I see. Thank you very much.
>>
>> On Fri, Feb 7, 2014 at 5:50 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>>> Hi Fabrice,
>>>
>>> I don't know. It might have something to do with that gene being a predicted
>>> gene. Perhaps we don't annotate such things? Or it may have been placed in
>>> Ensembl between now and when we build the org.Mm.eg.db package.
>>>
>>> If you require the most recent data, you can always build your own package
>>> using  makeOrgPackageFromNCBI() in the AnnotationForge package, or use
>>> biomaRt.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>> On Friday, February 07, 2014 5:41:13 PM, Fabrice Tourre wrote:
>>>>
>>>> Jim,
>>>>
>>>> Thank you very much. It makes sense to me.
>>>>
>>>> One small question, why ENSMUSG00000082538 is given NA. but it is
>>>> given a symbol Gm14704 on ENSEMBL.
>>>>
>>>>
>>>> http://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000082538;r=X:70403181-70404054;t=ENSMUST00000120300
>>>>
>>>>
>>>>
>>>> On Fri, Feb 7, 2014 at 4:52 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>>>>>
>>>>> Hi Fabrice,
>>>>>
>>>>>
>>>>>
>>>>> On 2/7/2014 4:08 PM, Fabrice Tourre wrote:
>>>>>>
>>>>>>
>>>>>> Dear experts,
>>>>>>
>>>>>> After I have run deseq, I got a list of genes. They are something like
>>>>>> as follow.
>>>>>>
>>>>>>> resSig[,1]
>>>>>>
>>>>>>
>>>>>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>>>>>> [2] "ENSMUSG00000026727:016"
>>>>>> [3] "ENSMUSG00000026727:004"
>>>>>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>>>>>> [5] "ENSMUSG00000025730:010"
>>>>>> [6] "ENSMUSG00000005836:007"
>>>>>> [7] "ENSMUSG00000073139:001"
>>>>>>
>>>>>> How can I get back the gene symbol for each ID?
>>>>>
>>>>>
>>>>>
>>>>>> gns
>>>>>
>>>>>
>>>>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>>>>> [2] "ENSMUSG00000026727:016"
>>>>> [3] "ENSMUSG00000026727:004"
>>>>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>>>>> [5] "ENSMUSG00000025730:010"
>>>>> [6] "ENSMUSG00000005836:007"
>>>>> [7] "ENSMUSG00000073139:001"
>>>>>
>>>>>> gns2 <- sapply(strsplit(gns, "\\+|:"), "[", 1)
>>>>>> gns2
>>>>>
>>>>> [1] "ENSMUSG00000022150" "ENSMUSG00000026727" "ENSMUSG00000026727"
>>>>> [4] "ENSMUSG00000022150" "ENSMUSG00000025730" "ENSMUSG00000005836"
>>>>> [7] "ENSMUSG00000073139"
>>>>>>
>>>>>> select(Mus.musculus, gns2, "SYMBOL", "ENSEMBL")
>>>>>
>>>>>               ENSEMBL   SYMBOL
>>>>> 1 ENSMUSG00000022150     Dab2
>>>>> 2 ENSMUSG00000026727     Rsu1
>>>>> 5 ENSMUSG00000025730   Rab40c
>>>>> 6 ENSMUSG00000005836    Gata6
>>>>> 7 ENSMUSG00000073139 BC023829
>>>>>
>>>>> Note that I am discarding the second Ensembl Gene ID. You could do
>>>>> something
>>>>> more sophisticated to capture duplicated IDs, but I'll leave that for you
>>>>> to
>>>>> figure out.
>>>>>
>>>>> Best,
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>>
>>>>>> Thank you very much in advance.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> University of Washington
>>>>> Environmental and Occupational Health Sciences
>>>>> 4225 Roosevelt Way NE, # 100
>>>>> Seattle WA 98105-6099
>>>>>
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> Steve Lianoglou
> Computational Biologist
> Genentech



More information about the Bioconductor mailing list