[BioC] DEXSeq - too many exons in gene

Fri Feb 7 17:39:43 CET 2014

Hi Alejandro,

thank you for the extra information in particular the referral to 
Steve's paper. It eluded me, but now I have some weekend reading.

Best,
António

-- 
António Miguel de Jesus Domingues, PhD
Postdoctoral researcher
Deep Sequencing Group - SFB655
Biotechnology Center (Biotec)
Technische Universität Dresden
Fetscherstraße 105
01307 Dresden

Phone: +49 (351) 458 82362
Email: antonio.domingues(at)biotec.tu-dresden.de
--
The Unbearable Lightness of Molecular Biology

On 02/07/2014 04:12 PM, Alejandro Reyes wrote:
> Hi Antonio,
>
> As an extra comment, the binning that we do in DEXSeq is not mandatory 
> to use DEXSeq for testing for alternative exon usage. For example, 
> DEXSeq can be also used introducing counts from exon-exon junction 
> reads (as in tools like MISO) or a "union" model like the one you 
> mentioned. Another example of a very creative use is the one from a 
> paper by Steve (10.1101/gad.229328.113), where they adapted their 
> counting bins to test specifically for alternative 3' UTR lengthening 
> independent from changes in gene expression.
>
> I guess all these are alternative approaches to try to quantify and 
> assemble transcript isoforms, that still has some limitations (e.g. 
> 10.1038/nmeth.2714).
>
> Best regards,
> Alejandro
>
>
>
>
>> Hi Devon,
>>
>> thank you for the clarification. I thought DEXSeq used a union model, 
>> but under the "disjoint gene model" it all makes sense now.
>>
>> Best,
>> António
>>
>> On 06/02/14 19:42, Devon Ryan wrote:
>>> Hi Antonio,
>>>
>>> I counted 13 exonic bins by eye. What do you find to be amiss there? 
>>> Remember that you're not using a flattened/union gene model with 
>>> DEXseq, but rather pretty much the exact opposite (maybe it should 
>>> be called a "disjoint gene model"?).
>>>
>>> BTW, that first bin is actually 2bp wide.
>>>
>>> Regards,
>>> Devon
>>>
>>> ____________________________________________
>>> Devon Ryan, Ph.D.
>>> Email: dpryan at dpryan.com
>>> Tel: +49 (0)178 298-6067
>>> Molecular and Cellular Cognition Lab
>>> German Centre for Neurodegenerative Diseases (DZNE)
>>> Ludwig-Erhard-Allee 2
>>> 53175 Bonn, Germany
>>>
>>> On Feb 6, 2014, at 7:12 PM, António domingues wrote:
>>>
>>>> Hi Steve,
>>>>
>>>> thank for the comments. First of all, my apologies, I have sent the 
>>>> wrong screenshot. It should have been the one (attached) for Sike1. 
>>>> Long day. Anyway,  see my replies bellow to the points that are 
>>>> still valid.
>>>>
>>>> On 02/06/2014 06:54 PM, Steve Lianoglou wrote:
>>>>> Hi,
>>>>>
>>>>> A few comments in line:
>>>>>
>>>>> On Thu, Feb 6, 2014 at 9:01 AM, António domingues
>>>>> <amjdomingues at gmail.com> wrote:
>>>>>> Hi Bioconductors,
>>>>>>
>>>>>> I happened upon a funny thing in DEXseq: a gene which appears to 
>>>>>> have more
>>>>>> exons in the final DEXseq output than the annotation suggests. 
>>>>>> The gene
>>>>>> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests 
>>>>>> the 3
>>>>>> exons in a flattened gene model. However, the DEXSeq results 
>>>>>> lists 13 exons
>>>>>> (here showing the output of htseq-count):
>>>>> Not sure why you say the *gene* only has 3 exons ... you have
>>>>> highlighted one isoform of the gene which has very few exons, but you
>>>>> can from both your picture and the exons definitions you pasted below
>>>>> for ENSMUSG00000027854 (presumably that's Csde1 :-) that if you
>>>>> consider all of the isoforms of the gene together, it has many more
>>>>> than just three exons.
>>>>>
>>>>> Know what I mean?
>>>> It is not Csde1 :s
>>>>
>>>>>> Between exon1 is only 1 base long (?) and exons1 to 4 are 
>>>>>> contiguous. As far
>>>>>> as I am aware, DEXSeq model should have flattened all of these 
>>>>>> into one
>>>>>> single "exon". Is this correct? is the error coming from the gtf? 
>>>>>> (at the
>>>>>> end of the message there is also the gene annotation in the gtf).
>>>>> I'm trying to parse the various exon annotations from your email, but
>>>>> I don't see where the 1-width exon is.
>>>> This one:
>>>> chr3    mm10_ensGene.gtf    exonic_part    102995728 102995729 .    
>>>> +    .    transcripts "ENSMUST00000029447"; exonic_part_number 
>>>> "001"; gene_id "ENSMUSG00000027854"
>>>>
>>>> Unless I calculated it incorrectly.
>>>>
>>>>> Figure 1 from their paper shows pretty clearly how the "break 
>>>>> down" of
>>>>> exons are calcualted across isoforms to create *counting bins* -- 
>>>>> just
>>>>> keep in mind that these things are not necessarily "exons" anymore.
>>>> Yes I am aware of that but I should have been clearer in the 
>>>> distinction from "exon" and counting bin. I thin that with the new 
>>>> screenshot it will become more apparent what I mean.
>>>>>> This is specially concerning for me because I am interested in 
>>>>>> selecting the
>>>>>> first and last exon of genes, using the exon ranking from DEXSeq, 
>>>>>> to analyze
>>>>>> further.
>>>>> I'm not sure if what I posted was at all helpful, but if someone else
>>>>> doesn't do a better job of providing you with the answer you were
>>>>> looking for, you might try to draw a figure of a gene model (with a
>>>>> few splicing isoforms) and point out what it is, exactly, that you
>>>>> hope to extract from it.
>>>>>
>>>>> While it's clear what "First and last" exon of a *single transcript
>>>>> isoform* of a gene might be, it might get hairy when you start
>>>>> summarizing the "counting bins" across multiple isoforms of the same
>>>>> gene.
>>>> True. I am only using the DEXseq results as a quick and dirty 
>>>> approach before I get data from other tools which handle better. 
>>>> For example, miso has annotations for alternative polyadenilation 
>>>> and Cufflinks provides some information on alternative promoter 
>>>> usage. Regardless, if the gene model is incorrect, which I hope it 
>>>> is and this is only me being thick, then DEXseq results from some 
>>>> counting bins not be trustworthy.
>>>>
>>>>
>>>>> Oh, and by the way:
>>>>>
>>>>>
>>>>>> Hi Bioconductors,
>>>>>>
>>>>>> I happened upon a funny thing in DEXseq: a gene which appears to 
>>>>>> have more
>>>>>> exons in the final DEXseq output than the annotation suggests. 
>>>>>> The gene
>>>>>> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests 
>>>>>> the 3
>>>>>> exons in a flattened gene model.
>>>>> I'd argue that the isoform of the gene that you highlighted in your
>>>>> original screen shot only has*two*  exons
>>>>>
>>>>> -steve
>>>> ehehe, correct.
>>>>
>>>>> HTH,
>>>>> -steve
>>>>>
>>>> Cheers,
>>>> António
>>>>
>>>>
>>>> <ENSMUSG00000027854.png>_______________________________________________ 
>>>>
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>