[BioC] [Bioc] RNAseq less sensitive than microarrays? Is it a statistical issue?

Tue May 21 20:57:53 CEST 2013

Ryan and Wolfgang,

Agreed, it will not be suitable for certain RNA-Seq applications, such
as splice variant discovery, but it could be a good approximation for
problems related to read double counting across multiple range features.
In general I am just raising this question to understand whether there
is any fundamental reason not to consider coverage values (whether total
sum or averaged) instead of read counts that I have missed. Using
limma's voom function for this situation makes sense, but there must be
someone who has performed some testing on this already and perhaps can
report some results, or perhaps can share a reference of a publication
addressing this?

Thomas

On Tue, May 21, 2013 at 06:07:38PM +0000, Wolfgang Huber wrote:
> Dear Thomas
> 
> you raise a good point. Working on the actual counts and modelling the discreteness of the data matters a lot when the number of samples is small, and when there are genes with small counts: e.g. in an experiment on a cell line or model organism 'treated vs untreated'. For large studies, where dozens or hundreds of samples are compared between balanced groups, it seems to matter less, and the good results of VST/voom + limma in such benchmarks support that view.
> 
> However, it is not clear that the latter is really everything that people will want from RNA-Seq data. One may also want to detect what small groups of samples do among the big set; or what smaller-than genes features (e.g. exons, like in DEXSeq) do, where when one would like the explicit count modelling back. What do you think?
> 
> PS - whether some sort of average coverage per gene would really be less confusing for users to compute than total coverage I am not so sure; there'll just be different confusions.
> 
> 	Best wishes
> 	Wolfgang
> 
> 
> 
> 
> 
> 
> On 21 May 2013, at 17:49, Thomas Girke <thomas.girke at ucr.edu> wrote:
> 
> > Hi Simon,
> > 
> > Because of these complications, I am sometimes wondering whether one
> > couldn't use for many RNA-Seq use cases coverage values (e.g. mean
> > coverage) as raw expression measure instead of read counts. Has anyone
> > systematically tested whether this would be a suitable approach for the
> > downstream DEG analysis? Right now everyone believes RNA-Seq analysis
> > requires read counting, but honestly I don't see why that is. Perhaps
> > the benefits of this are so minor that it is not worth dealing with a
> > different type of expression measure. 
> > 
> > Thomas
> > 
> > On Mon, May 20, 2013 at 11:15:04PM +0000, Simon Anders wrote:
> >> Dear Lucia and list
> >> 
> >> On second reading, I noticed that my previous post sounded quite 
> >> aggressive, which was not my intention. Sorry. I shouldn't write e-mails 
> >> that late at night.
> >> 
> >> Anyway: We had a lot of discussion on this list and others recently 
> >> about how to correctly obtain a count table for differential expression 
> >> analysis from aligned RNA-Seq reads. From these discussions, it has 
> >> become clear that this is a task with many more pitfalls than one might 
> >> expect at first. In microarray analysis, there is no need to do this, 
> >> and so read counting is a likely culprit when such discrepancies are 
> >> noted. This is why exact details on the procedure are so important.
> >> 
> >>   Simon
> >> 
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>