[BioC] Applying DESeq on RSEM output

Simon Anders anders at embl.de
Thu Mar 21 14:19:43 CET 2013


Hi Dvir

On 20/03/13 14:15, dvir.tau at gmail.com wrote:
> I'm running DESeq and EdgeR on RNA-Seq data that was already processed with
> RSEM (downloaded from TCGA web site).
>
> Since these methods require the raw read counts I'm using the raw_count
> column of the RSEM output but I'm not sure this is the right thing to do (is
> it the actual raw count required ?)

The real issue is not that your counts are not integer, but that RSEM 
gives you counts per isoform rather than per gene. Now, if you have two 
very similar isoforms, RSEM will be unable to decide which isoform to 
assign a read to and just spread them proportionally over both. Hence, 
even if only one of the two isoforms is differentially expressed, you 
will incorrectly see differential expression for both isoforms.

This is why the output of isoform quantification methods such as RSEM of 
MMSeq are not suitable as input for differential expression tests.

At the very minimum, you need also the information about the uncertainty 
of the assignments of reads to isoforms. In fact, RSEM provides this 
information if you run it in its Bayesian mode, but this seems to be 
hardly ever done in practice.

If you really need to perform differential expression analysis on a 
level finer than whole gene expression, you should either use a tool for 
differential exon usage testing, such as our DEXSeq package, or one that 
combines isoform abundance estimation and testing for differences in a 
unified framework, such as BitSeq. In both cases, you will need the SAM 
files.

If you are fine with staying on the gene level for your analysis, you 
need to get counts per gene, not per isoform. I am not familiar enough 
with RSEM, though, to tell you whether adding up the counts from all the 
isoforms per gene would be a good idea.

   Simon



More information about the Bioconductor mailing list