[BioC] DEXSeq - too many exons in gene

António domingues amjdomingues at gmail.com
Thu Feb 6 19:12:40 CET 2014


Hi Steve,

thank for the comments. First of all, my apologies, I have sent the 
wrong screenshot. It should have been the one (attached) for Sike1. Long 
day. Anyway,  see my replies bellow to the points that are still valid.

On 02/06/2014 06:54 PM, Steve Lianoglou wrote:
> Hi,
>
> A few comments in line:
>
> On Thu, Feb 6, 2014 at 9:01 AM, António domingues
> <amjdomingues at gmail.com> wrote:
>> Hi Bioconductors,
>>
>> I happened upon a funny thing in DEXseq: a gene which appears to have more
>> exons in the final DEXseq output than the annotation suggests. The gene
>> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests the 3
>> exons in a flattened gene model. However, the DEXSeq results lists 13 exons
>> (here showing the output of htseq-count):
> Not sure why you say the *gene* only has 3 exons ... you have
> highlighted one isoform of the gene which has very few exons, but you
> can from both your picture and the exons definitions you pasted below
> for ENSMUSG00000027854 (presumably that's Csde1 :-) that if you
> consider all of the isoforms of the gene together, it has many more
> than just three exons.
>
> Know what I mean?

It is not Csde1 :s

>
>> Between exon1 is only 1 base long (?) and exons1 to 4 are contiguous. As far
>> as I am aware, DEXSeq model should have flattened all of these into one
>> single "exon". Is this correct? is the error coming from the gtf? (at the
>> end of the message there is also the gene annotation in the gtf).
> I'm trying to parse the various exon annotations from your email, but
> I don't see where the 1-width exon is.
This one:
chr3    mm10_ensGene.gtf    exonic_part    102995728    102995729 .    
+    .    transcripts "ENSMUST00000029447"; exonic_part_number "001"; 
gene_id "ENSMUSG00000027854"

Unless I calculated it incorrectly.

>
> Figure 1 from their paper shows pretty clearly how the "break down" of
> exons are calcualted across isoforms to create *counting bins* -- just
> keep in mind that these things are not necessarily "exons" anymore.

Yes I am aware of that but I should have been clearer in the distinction 
from "exon" and counting bin. I thin that with the new screenshot it 
will become more apparent what I mean.
>
>> This is specially concerning for me because I am interested in selecting the
>> first and last exon of genes, using the exon ranking from DEXSeq, to analyze
>> further.
> I'm not sure if what I posted was at all helpful, but if someone else
> doesn't do a better job of providing you with the answer you were
> looking for, you might try to draw a figure of a gene model (with a
> few splicing isoforms) and point out what it is, exactly, that you
> hope to extract from it.
>
> While it's clear what "First and last" exon of a *single transcript
> isoform* of a gene might be, it might get hairy when you start
> summarizing the "counting bins" across multiple isoforms of the same
> gene.

True. I am only using the DEXseq results as a quick and dirty approach 
before I get data from other tools which handle better. For example, 
miso has annotations for alternative polyadenilation and Cufflinks 
provides some information on alternative promoter usage. Regardless, if 
the gene model is incorrect, which I hope it is and this is only me 
being thick, then DEXseq results from some counting bins not be 
trustworthy.


> Oh, and by the way:
>
>
>> Hi Bioconductors,
>>
>> I happened upon a funny thing in DEXseq: a gene which appears to have more
>> exons in the final DEXseq output than the annotation suggests. The gene
>> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests the 3
>> exons in a flattened gene model.
> I'd argue that the isoform of the gene that you highlighted in your
> original screen shot only has*two*  exons
>
> -steve
ehehe, correct.

>
> HTH,
> -steve
>
Cheers,
António


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ENSMUSG00000027854.png
Type: image/png
Size: 12348 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140206/2c8d57a1/attachment.png>


More information about the Bioconductor mailing list