[BioC] R: Found seven 3'UTR sequences attributed to the same ensembl_gene_id

David K Pritchard dpritch at u.washington.edu
Wed Jul 1 05:17:17 CEST 2009


Maura,
    many genes can make multiple transcripts  with different structures that we call alternative spliceforms. It is quite possible to have a gene which has multiple transcripts with 7 different 3'UTRs. A good place to check this out is the ensembl website that will show you the structure of the different transcripts encoded by the gene.
    I can not speak to the accuracy of your code but I took the ID I saw in your post ENSG00000144134 and went to www.ensembl.org and searched there for your ID.  I looked at the gene which came up in detail.  Try (http://www.ensembl.org/Homo_sapiens/Location/View?g=ENSG00000144134)  and I saw that this gene did encode multiple transcripts with different 3'UTRs.  I am not sure if I found seven different forms but there were a bunch.  You could as a check take the 7 different 3'UTRs that you found and BLAST them at the ensembl web site and see if they all map to your gene of interest.

David Pritchard


On Wed, 1 Jul 2009 mauede at alice.it wrote:

> Unluckily there is no one in our group who has clear idea about these Biology matter.
> We are missing the Biology professor who is still hospitalized.
> I asked someone in my group "is there a unique 3UTR region in a gene?" I was answered "yes".
>
> I know there is plenty of material about these topics on the web.
> I really need some very basic reading just to get a grasp of rudimental concepts.
>
> Best regards,
> Maura
>
> -----Messaggio originale-----
> Da: Sean Davis [mailto:seandavi at gmail.com]
> Inviato: mar 30/06/2009 17.31
> A: mauede at alice.it
> Cc: Miichael Watson; Steve Lianoglou; Bioconductor List
> Oggetto: Re: Found seven 3'UTR sequences attributed to the same ensembl_gene_id
>
> On Tue, Jun 30, 2009 at 11:28 AM, <mauede at alice.it> wrote:
>
>>  I found seven 3'UTR sequences attributed to the same ensembl_gene_id.
>> Naively, I wonder whether it is possible, or it is
>> the consequence of a logic bug in my code. Can the same gene have more than
>> one 3'UTR region ?
>> In the following is  is what I have extracted running just the first
>> iteration of a nested loop.
>> Is that *real* ?
>>
> Hi, Maura.
>
> Genes do not have 3'UTR regions.  Only transcripts have 3'UTRs.  So, since a
> gene can have multiple transcripts, there will be multiple 3'UTRs associated
> with each gene.  So, I think your code is probably fine.
>
> Sean
>
>
>>
>> Thank you for your attention.
>> Maura
>>
>>
>> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
>>
>>> enst
>> [1] "ENST00000376439"
>>
>>> rec <-
>> getBM(attributes=c('hgnc_symbol','ensembl_gene_id','ensembl_transcript_id','refseq_dna'),
>> + filters='ensembl_transcript_id', values=enst, mart=hmart)
>>> rec
>>   hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna
>> 1      RABL2A ENSG00000144134       ENST00000376439         NA
>>
>>> rec[,"ensembl_gene_id"]
>> [1] "ENSG00000144134"
>>
>>> seq =
>> getSequence(id=rec[,"ensembl_gene_id"],type="ensembl_gene_id",seqType="3utr",mart=hmart)
>>
>>> seq
>>
>>
>> 3utr
>> 1
>> GGGGCTGGGGCTAGGGGTGGGTGGAGCCCTTTTAAAATACCCTTCCCTTCAACAACTCTCCAGCTCTGAATGGAGAAACTCTCTAGGCCATCCCCTCTTCTACCTCCTGCAACCCACCCATCCTATTAGCCTCCCACATTCAAGGCCCGTGATACAGGGATGAGGTCAGCACCAGCAAACTCTGGACTGGTGGAAGAATTCCCCACCAGATCTCCTTGAAGCAGAATTAGGGATCAGCATCATTAACACCTTCCCCACCCCCTCCCCCCAGGCAGACAGTGAAGAGAATCAGAAAACATGATTATGTGTCACTTTAATACAGGAAATTTAGGTGTTTTTTGGTGTTTTTGTTTTTGTTTTCTTTCCAAAGCTCACCTCGGGGACAATTCCTTGGGCTTCTCCTGAGTCTCGCTCTGTCGCCAGGCTGGAGTGCAGTGGCGCAGTCTCGGCTCGCTGCAACCTCTGACTCCCTGGTTCAAACGATTCTCCTGCCTCAGCCTCCCGAGTGGCTGGCATCACCACGCCCAGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGCCCAGGATGGTCTCGATCTCCTCACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCGCGCCCGGCCCCAATCATCTGTTTTTAAACAATCGTTTTTGAGCAGATAGCTATTCATTCCAGATTTCCGTGTACCCACTCTGTTTCAGGAGCTCTTCTAGGTAAAGCTGAGATCACAGGAACAGCAGGTGACAGGCCTAGCTATAGTTAGGAATACACAAGCGGTAAAATCGAGTCCTTACAGCCATACCACAAGGTACGTCCATTTGGACTACAAGAAGAGCTTCCTTTAAAGTTCCTATTTCAGCATAAAGAGGCTGTCCTTTTTTTTTAGGAATAGTTTGGACCTTGTGCCTCCTGTGGGAGGCTGAGGACTGCAAGAGGAGAGCTAGC!
 A!
> GATATGCCTGTTCACCCCTCTCTGGTACTTGTGGCTTGCTAGTATGTTTTTATGATAATCTCGGGCATTGTTTGCATTGTGTTTATTAATAGGGTTTTGTTTTTATTGTTTCCTTTTTTACAGTAAAGGCTGAATGACATAAA
>> 2
>>
>> GGGGCTGGGGCTAGGGGTGGGTGGAGCCCTTTTAAAATACCCTTCCCTTCAACAACTCTCCAGCTCTGAATGGAGAAACTCTCTAGGCCATCCCCTCTTCTACCTCCTGCAACCCACCCATCCTATTAGCCTCCCACATTCAAGGCCCGTGATACAGG
>> 3
>> GGGGCTGGGGCTAGGGGTGGGTGGAGCCCTTTTAAAATACCCTTCCCTTCAACAACTCTCCAGCTCTGAATGGAGAAACTCTCTAGGCCATCCCCTCTTCTACCTCCTGCAACCCACCCATCCTATTAGCCTCCCACATTCAAGGCCCGTGATACAGGGATGAGGTCAGCACCAGCAAACTCTGGACTGGTGGAAGAATTCCCCACCAGATCTCCTTGAAGCAGAATTAGGGATCAGCATCATTAACACCTTCCCCACCCCCTCCCCCCAGGCAGACAGTGAAGAGAATCAGAAAACATGATTATGTGTCACTTTAATACAGGAAATTTAGGTGTTTTTTGGTGTTTTTGTTTTTGTTTTCTTTCCAAAGCTCACCTCGGGGACAATTCCTTGGGCTTCTCCTGAGGTAATGATTACCCCCCCACCCACAGCTGAGTCTGTGAGGCCCCATCCTTTCCCTACGTTTTCTCCCATCTTTTTTCCTCTTCAATCTCCCAGTCATCTGGTTTGTTTGTTTCTTTGTTCGTCCTGAGACGGAGTCTCGCTCTGTCGCCAGGCTGGAGTGCAGTGGCGCAGTCTCGGCTCGCTGCAACCTCTGACTCCCTGGTTCAAACGATTCTCCTGCCTCAGCCTCCCGAGTGGCTGGCATCACCACGCCCAGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGCCCAGGATGGTCTCGATCTCCTCACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCGCGCCCGGCCCCAATCATCTGTTTTTAAACAATCGTTTTTGAGCAGATAGCTATTCATTCCAGATTTCCGTGTACCCACTCTGTTTCAGGAGCTCTTCTAGGTAAAGCTGAGATCACAGGAACAGCAGGTGACAGGCCTAGCTATAGTTAGGAATACACAAGCGGTAAAATCGAGTCCTTACAGCCATACCACAAGGTAC!
 G!
> TCCATTTGGACTACAAGAAGAGCTTCCTTTAAAGTTCCTATTTCAGCATAAAGAGGCTGTCCTTTTTTTTTAGGAATAGTTTGGACCTTGTGCCTCCTGTGGGAGGCTGAGGACTGCAAGAGGAGAGCTAGCAGATATGCCTGTTCACCCCTCTCTGGTACTTGTGGCTTGCTAGTATGTTTTTATGATAATCTCGGGCATTGTTTGCATTGTGTTTATTAATAGGGTTTTGTTTTTATTGTTTCCTTTTTTACAGTAAAGGCTGAATGACAT
>> 4
>> GGGGCTGGGGCTAGGGGTGGGTGGAGCCCTTTTAAAATACCCTTCCCTTCAACAACTCTCCAGCTCTGAATGGAGAAACTCTCTAGGCCATCCCCTCTTCTACCTCCTGCAACCCACCCATCCTATTAGCCTCCCACATTCAAGGCCCGTGATACAGGGATGAGGTCAGCACCAGCAAACTCTGGACTGGTGGAAGAATTCCCCACCAGATCTCCTTGAAGCAGAATTAGGGATCAGCATCATTAACACCTTCCCCACCCCCTCCCCCCAGGCAGACAGTGAAGAGAATCAGAAAACATGATTATGTGTCACTTTAATACAGGAAATTTAGGTGTTTTTTGGTGTTTTTGTTTTTGTTTTCTTTCCAAAGCTCACCTCGGGGACAATTCCTTGGGCTTCTCCTGAGGTAATGATTACCCCCCCACCCACAGCTGAGTCTGTGAGGCCCCATCCTTTCCCTACGTTTTCTCCCATCTTTTTTCCTCTTCAATCTCCCAGTCATCTGGTTTGTTTGTTTCTTTGTTCGTCCTGAGACGGAGTCTCGCTCTGTCGCCAGGCTGGAGTGCAGTGGCGCAGTCTCGGCTCGCTGCAACCTCTGACTCCCTGGTTCAAACGATTCTCCTGCCTCAGCCTCCCGAGTGGCTGGCATCACCACGCCCAGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGCCCAGGATGGTCTCGATCTCCTCACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCGCGCCCGGCCCCAATCATCTGTTTTTAAACAATCGTTTTTGAGCAGATAGCTATTCATTCCAGATTTCCGTGTACCCACTCTGTTTCAGGAGCTCTTCTAGGTAAAGCTGAGATCACAGGAACAGCAGGTGACAGGCCTAGCTATAGTTAGGAATACACAAGCGGTAAAATCGAGTCCTTACAGCCATACCACAAGGTAC!
 G!
> TCCATTTGGACTACAAGAAGAGCTTCCTTTAAAGTTCCTATTTCAGCATAAAGAGGCTGTCCTTTTTTTTTAGGAATAGTTTGGACCTTGTGCCTCCTGTGGGAGGCTGAGGACTGCAAGAGGAGAGCTAGCAGATATGCCTGTTCACCCCTCTCTGGTACTTGTGGCTTGCTAGTATGTTTTTATGATAATCTCGGGCATTGTTTGCATTGTGTTTATTAATAGGGTTTTGTTTTTATTGTTTCCTTTTTTACAGTAAAGGCTGAATGACATAAA
>> 5
>> CTGCTTCCTGCATCTGCTGCATCTCCGTGGGCTCCCCTCAGACCCTCTTCTGAAGGCCTGGGGTGTCTCTCCTGCCACCATGCCTGTGTCTGCAGGTGCCTGCCACCAGCCCCAGTCTGCTGCACGGGCCCTGGCAAGTAGAAAGCACTTGCCTTCTGACCACACGGGGAGCTGAGGGTCAGAGACGGAACCAGGGCTCGACCCTCCACCTTGAAACCTTGAGATGGGGATGCTCCTCATTCTAGCCAGTTCCTCCTCAGCTCTCAAAACAAGGAACAGATGCTCAGGAAACCAGATCTGGACAAAAGTCATCTGAGCCTGGTGTGAGGCAGATTCCAGAAGTTTAGTTACAGACATCCTTTATAAGGAGACTTCATCGGGAATTCAAGACAACCTGGTGATTCATTGAAATTTGCCTGTGAAAGAGAATCTACATAGACTTCCTGCCACCTCTTGAGATGTGACAGTTGCTGACCCTCCCGCCACCACACAGGGCGAGCCCCTAGCCCTGAGCTTGAACCATGTTGCTTGCACAAATAGCTGGGTGATTTAGAAGTGAGGTCAGCTGTGCCAGCAGTTACAGGGTGGTGGTTGTCTGTAACTTTAATCCACTGACTGTTGTACTAGGGCAGTTTGGGCTAGACACTTTGGAGGAGCTCCTGTGAAGGGCATGAAGGCTCACTGTAGCAGCAGCTCAGTTGTCTTTCAGAGTTCTGCCCTTAGAGCTGGTTTGCAGTGCTCATCCTTCTTGCTGATATTTTAAAATAGGTAGAAACAGGCTGGGCGCGGTGACTCATGCCTTTAATCCCAGTACTTTGGGAGGCCTAGGTGGGCAGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGACCAACATGTTGAAACCCCGTCTCTACTAGAAATACAAAAATTAGCCAGGCGTGGTGGCGCGCACCTGTAATCCAGCTACTCAGGAGGCTGAGACAGGAGAATCGTTTGAAGACAGG!
 A!
> GAATCGTTTGAACCCAGGAGGTGGAGGTTGCAGTGGCAGTGAGCCAAGATACCGCCACTGCACTCTAGCCTGGGCAACAGAGCAAGACTCCATCTCAAAATAAATAAATAAATAAAAATAAAATAGGTAAAAACAAATTATAAAGTAATACAATTATGAACTGCAAATAATAAAACATAAAAATTACTTTAAAAAAATTTAAAGAGGCCGGGCACAGTGGCTTATGCCTGTAATCCCAGAAATTTGGGAGGCCGAGGCAGGAGGATCACTTGAGCCCGGGAGTCCAAGACCAGCCTCGTTAATATAATGAGAGCTTATCATCTCTACAAAAAATAAACAAAATTAGCCAGGCATGGTGGCATGTGCCTGTAGTTCCAGCTACTCAGGAGGCTGAGGTAGGAGGATCACTGGAGCCCAGGGGGTGGAGGAGCAGTAAGCCAAGATTCTGCCACTGCACTCCAGCCTGGCTGACAGAGTAAGACCCTATCTCAAAAAACAAAAAGCAGAAAGAACAAAGAAGTAAACAAAAGCTTAAAAGTAAATCAGCCAGGTGCAGTAGCTCATGCCTGTAATCCCAGTACTTTGGGAGGCCTAGGCAGGCAGATTACTGCAGGTCAAGAGTTTGAGACCAGCCTGGCCAACATGATGAAACCCTGTCTCTACTAAAACTACAAAAATTAGCCAGGCATGGTGGTGCGCACCTGTAATCCCAGCTACTCCGGAGGCTGAGACAAGAGAATCGCTTGAACCTAGGAAGTGGAGGTTGCAGTGGCAGTGAGCCAAGATAGCGCCACTGCACTCCAGCCTGGGCAACAGAGCAAGACTCCATATATGGAGATCCCTTGAGATCAAGAGTTCGAGACCAGCCTGGCCAACACGGCAAAACCCTGTCTCTACTAAAAATAAAAAAA
>> 6
>>
>> GGGGCTGGGGCTAGGGGAATAGTTTGGACCTTGTGCCTCCTGTGGGAGGCTGAGGACTGCAAGAGGAGAGCTAGCAGATATGCCTGTTCACCCCTCTCTGGTACTTGTGGCTTGCTAGTATGTTTTTATGATAATCTCGGGCATTGTTTGCATTGTGTTTATTAATAGGGTTTTGTTTTTATTGTTTCCTTTTTTACAGTAAAGGCTGAATGACATAA
>> 7
>>
>> GGGGCTGGGGCTAGGGGAATAGTTTGGACCTTGTGCCTCCTGTGGGAGGCTGAGGACTGCAAGAGGAGAGCTAGCAGATATGCCTGTTCACCCCTCTCTGGTACTTGTGGCTTGCTAGTATGTTTTTATGATAATCTCGGGCATTGTTTGCATTGTGTTTATTAATAGGGTTTTGTTTTTATTGTTTCCTTTTTTACAGTAAAGGCTGAATGACAT
>>   ensembl_gene_id
>> 1 ENSG00000144134
>> 2 ENSG00000144134
>> 3 ENSG00000144134
>> 4 ENSG00000144134
>> 5 ENSG00000144134
>> 6 ENSG00000144134
>> 7 ENSG00000144134
>>
>>
>>
>>
>>
>>
>> Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e
>> tutti i telefonini TIM!
>
> er
>>
>
>
>
>
>
> tutti i telefonini TIM!
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list