[BioC] multi-mapped reads: Cufflinks + baySeq? edgeR?

Fri Mar 1 19:42:14 CET 2013

Hi Hilary,

If you want to include multi-mapped reads in your counts, I believe 
that the latest version of Cufflinks reports an estimate of the counts 
for each transcript/gene in addition to FPKM. See the Cufflinks papers 
& manual to see how it splits the counts for multi-mapping reads. 
However, if you decide to use count estimates based on multi-mapping 
reads in edgeR or baySeq, be aware that you are feeding them estimates 
with some degree of uncertainty, while they are expecting exact raw 
counts. As a result, the uncertainty in the count estimates will be 
ignored by edgeR & baySeq, leading to an underestimate of variability 
and an overestimate of significance. If most of your reads are uniquely 
mapped, then this effect is probably quite modest. You should compare 
the multi-mapped counts to the unique-only counts from htseq to see 
what fraction of your reads are counted as multi-mapped. The higher 
this fraction, the greater your risk of overstating the significance of 
differential expression calls, but in truth no one really knows how big 
an effect this might have, or whether it negates the benefit of 
including the additional data from multi-mapping reads.

Hope this helps,

-Ryan

On Fri 01 Mar 2013 06:51:33 AM PST, Hilary [guest] wrote:
>
> Hi,
> I am trying to analyze RNA-Seq data for (gene-level) differential expression between treatments, in a design incorporating multiple factors (effects of species * treatment & interaction, 4 replicates for each combination). I have reads that map to multiple locations (Single-End data) and while I'd first used Bowtie2/Tophat >> htseq (discarding multi-mapping reads = multihits in htseq), and then used the GLM and baySeq approaches, it was suggested I go back and include multi-mapping hits.
>
> I know Cufflinks allows incorporation of the multi-mapping reads (Mortazavi method I think), and I know it is not compatible with the GLM methods of edgeR/DEseq due to use of FPKM but does that incompatibility apply to baySeq as well?
>
> Using CuffDiff seems problematic as it only does pairwise tests -- and while I can do that, I think a full model testing for individual effects and their interaction (esp. as our real interest is the species*treatment interaction) is probably more statistically accurate. Thus I'm not sure how to proceed; any suggestions would be greatly appreciated if someone has time!
> Thank you,
> Hilary
>
>   -- output of sessionInfo():
>
> ()
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor