[BioC] aggregate genes in DEXSeq

Alejandro Reyes alejandro.reyes at embl.de
Wed Feb 27 14:55:14 CET 2013


Dear Julien, Dear Mar and people interested in DEXSeq ,

You recently reported some problems in DEXSeq that had to do with the 
way the HTSeq python scripts deal with the exons that overlap with more 
than one gene ID.

The solution that we had taken so far was that the gene IDs sharing an 
exon were merged into an "aggregate gene" ID.  From the input of some 
users and our own experience, we know that it was not the most 
appropriate solution: when the merged genes were differentially 
expressed, DEXSeq falsely calls differential usage in other exons of the 
aggregate genes. We have included a "-r" parameter in the script 
"prepare_annotation_dexseq.py", for the user to decide what to do with 
these exons: either to ignore the exons associated with more than one 
gene IDs and treat each gene separately, or to merge the genes and take 
these exons into account.

Additionally, we have implemented the R/Bioconductor functions 
equivalent to the python scripts. These functions were implemented using 
code contributed by Mike Love.

All these changes are available in the last svn version (1.5.9).

Best regards,
Alejandro Reyes


Hi Alejandro,
Just to let you know that adding the junctions to the test of 
differential expression of DEXSeq worked fine! The "hack" was actually 
straightforward, I just had to modify the counts files taken as input.

On a different note, I noticed that many false positives were generated 
because of "aggregate" gene models that were composed on different 
overlapping genes. When these overlapping genes have different behavior 
in different conditions, this is interpreted as differential expression 
of some exons, while it is differential expression of genes... See the 
attached picture, this might turn out to be easier to understand
Did you notice this behavior of DEXSeq, and do you have any comment on 
this?

Thanks again for your work on DEXSeq
Julien



More information about the Bioconductor mailing list