[BioC] DEXSeq - too many exons in gene

Alejandro Reyes alejandro.reyes at embl.de
Fri Feb 7 16:12:10 CET 2014


Hi Antonio,

As an extra comment, the binning that we do in DEXSeq is not mandatory 
to use DEXSeq for testing for alternative exon usage. For example, 
DEXSeq can be also used introducing counts from exon-exon junction reads 
(as in tools like MISO) or a "union" model like the one you mentioned. 
Another example of a very creative use is the one from a paper by Steve 
(10.1101/gad.229328.113), where they adapted their counting bins to test 
specifically for alternative 3' UTR lengthening independent from changes 
in gene expression.

I guess all these are alternative approaches to try to quantify and 
assemble transcript isoforms, that still has some limitations (e.g. 
10.1038/nmeth.2714).

Best regards,
Alejandro




> Hi Devon,
>
> thank you for the clarification. I thought DEXSeq used a union model, 
> but under the "disjoint gene model" it all makes sense now.
>
> Best,
> António
>
> On 06/02/14 19:42, Devon Ryan wrote:
>> Hi Antonio,
>>
>> I counted 13 exonic bins by eye. What do you find to be amiss there? 
>> Remember that you're not using a flattened/union gene model with 
>> DEXseq, but rather pretty much the exact opposite (maybe it should be 
>> called a "disjoint gene model"?).
>>
>> BTW, that first bin is actually 2bp wide.
>>
>> Regards,
>> Devon
>>
>> ____________________________________________
>> Devon Ryan, Ph.D.
>> Email: dpryan at dpryan.com
>> Tel: +49 (0)178 298-6067
>> Molecular and Cellular Cognition Lab
>> German Centre for Neurodegenerative Diseases (DZNE)
>> Ludwig-Erhard-Allee 2
>> 53175 Bonn, Germany
>>
>> On Feb 6, 2014, at 7:12 PM, António domingues wrote:
>>
>>> Hi Steve,
>>>
>>> thank for the comments. First of all, my apologies, I have sent the 
>>> wrong screenshot. It should have been the one (attached) for Sike1. 
>>> Long day. Anyway,  see my replies bellow to the points that are 
>>> still valid.
>>>
>>> On 02/06/2014 06:54 PM, Steve Lianoglou wrote:
>>>> Hi,
>>>>
>>>> A few comments in line:
>>>>
>>>> On Thu, Feb 6, 2014 at 9:01 AM, António domingues
>>>> <amjdomingues at gmail.com> wrote:
>>>>> Hi Bioconductors,
>>>>>
>>>>> I happened upon a funny thing in DEXseq: a gene which appears to 
>>>>> have more
>>>>> exons in the final DEXseq output than the annotation suggests. The 
>>>>> gene
>>>>> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests 
>>>>> the 3
>>>>> exons in a flattened gene model. However, the DEXSeq results lists 
>>>>> 13 exons
>>>>> (here showing the output of htseq-count):
>>>> Not sure why you say the *gene* only has 3 exons ... you have
>>>> highlighted one isoform of the gene which has very few exons, but you
>>>> can from both your picture and the exons definitions you pasted below
>>>> for ENSMUSG00000027854 (presumably that's Csde1 :-) that if you
>>>> consider all of the isoforms of the gene together, it has many more
>>>> than just three exons.
>>>>
>>>> Know what I mean?
>>> It is not Csde1 :s
>>>
>>>>> Between exon1 is only 1 base long (?) and exons1 to 4 are 
>>>>> contiguous. As far
>>>>> as I am aware, DEXSeq model should have flattened all of these 
>>>>> into one
>>>>> single "exon". Is this correct? is the error coming from the gtf? 
>>>>> (at the
>>>>> end of the message there is also the gene annotation in the gtf).
>>>> I'm trying to parse the various exon annotations from your email, but
>>>> I don't see where the 1-width exon is.
>>> This one:
>>> chr3    mm10_ensGene.gtf    exonic_part    102995728 102995729 .    
>>> +    .    transcripts "ENSMUST00000029447"; exonic_part_number 
>>> "001"; gene_id "ENSMUSG00000027854"
>>>
>>> Unless I calculated it incorrectly.
>>>
>>>> Figure 1 from their paper shows pretty clearly how the "break down" of
>>>> exons are calcualted across isoforms to create *counting bins* -- just
>>>> keep in mind that these things are not necessarily "exons" anymore.
>>> Yes I am aware of that but I should have been clearer in the 
>>> distinction from "exon" and counting bin. I thin that with the new 
>>> screenshot it will become more apparent what I mean.
>>>>> This is specially concerning for me because I am interested in 
>>>>> selecting the
>>>>> first and last exon of genes, using the exon ranking from DEXSeq, 
>>>>> to analyze
>>>>> further.
>>>> I'm not sure if what I posted was at all helpful, but if someone else
>>>> doesn't do a better job of providing you with the answer you were
>>>> looking for, you might try to draw a figure of a gene model (with a
>>>> few splicing isoforms) and point out what it is, exactly, that you
>>>> hope to extract from it.
>>>>
>>>> While it's clear what "First and last" exon of a *single transcript
>>>> isoform* of a gene might be, it might get hairy when you start
>>>> summarizing the "counting bins" across multiple isoforms of the same
>>>> gene.
>>> True. I am only using the DEXseq results as a quick and dirty 
>>> approach before I get data from other tools which handle better. For 
>>> example, miso has annotations for alternative polyadenilation and 
>>> Cufflinks provides some information on alternative promoter usage. 
>>> Regardless, if the gene model is incorrect, which I hope it is and 
>>> this is only me being thick, then DEXseq results from some counting 
>>> bins not be trustworthy.
>>>
>>>
>>>> Oh, and by the way:
>>>>
>>>>
>>>>> Hi Bioconductors,
>>>>>
>>>>> I happened upon a funny thing in DEXseq: a gene which appears to 
>>>>> have more
>>>>> exons in the final DEXseq output than the annotation suggests. The 
>>>>> gene
>>>>> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests 
>>>>> the 3
>>>>> exons in a flattened gene model.
>>>> I'd argue that the isoform of the gene that you highlighted in your
>>>> original screen shot only has*two*  exons
>>>>
>>>> -steve
>>> ehehe, correct.
>>>
>>>> HTH,
>>>> -steve
>>>>
>>> Cheers,
>>> António
>>>
>>>
>>> <ENSMUSG00000027854.png>_______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list