[BioC] DESeq versus CuffDiff2 for RNA-seq expression quantification in parasite-infected blood

Fri Apr 26 19:40:18 CEST 2013

Hi Kevin

On 26/04/13 19:32, Kevin Lee wrote:
> I appreciate your assistance.  A follow-up question: what is the
> appropriate method to handle a read that splits an exon junction and is
> therefore mapped to two exons when using a short read mapping software?
>   Counting it as being present in both exons seems to give undue weight
> to the read when using DESeq; conversely, it seems important to "double
> count" it when using DEXSeq.  Any advice?  And any software to readily
> generate these kinds of files, the matrix files required for DE(X)Seq?
>   I have just been using an overlapper script that I wrote using the bam
> files and ucsc gene annotations.

I use Python scripts for counting. For DESeq, you can use the 
htseq-count script (available from 
http://www-huber.embl.de/users/anders/HTSeq/ ), and for DEXSeq, use the 
dexseq-count.py script that comes with the DEXSeq Bioconductor package.

The reason that we offer two scripts, and suggest to produce sepearte 
count tables for DESeq and DEXSeq, is precisely because of the issue 
with reads mapping to two exons that you point out.

While this works well, I do admit that this state of thing is not 
terribly elegant.

BTW: If you use our scripts with UCSC annotation, make sure to fix the 
gene IDs. (The UCSC table browser puts transcript IDs where it should 
put gene IDs; you need to remove the ".nn" suffixes. You will see what I 
mean once you have a look at the GFF files.)

   Simon