[BioC] perspective on differential expression counting - unmapped reads?
dpryan at dpryan.com
Thu Oct 31 18:55:56 CET 2013
You can just remove those lines (in fact, that's what DESeq2 does internally), they'll just needlessly increase the number of tests performed.
Devon Ryan, Ph.D.
Email: dpryan at dpryan.com
Tel: +49 (0)178 298-6067
Molecular and Cellular Cognition Lab
German Centre for Neurodegenerative Diseases (DZNE)
53175 Bonn, Germany
On Oct 31, 2013, at 6:30 PM, Jon BR wrote:
> I'm interested in calculating differential expression from some paired
> RNAseq samples.
> I've used htseq-count after mapping; quite happy with how easy that was.
> My question is with regard to whether or not to trip the last five rows
> from htseq-count output.
> Those rows look like this:
> no_feature 152030
> ambiguous 4876
> too_low_aQual 0
> not_aligned 0
> alignment_not_unique 0
> I can dream of reasons supporting either side of this question.. The number
> of unmapped or ambiguously-mapping reads do contribute to the total library
> size. However, I'm also interested in quantifying the difference between
> what's human in both samples, so intuition would tell me to remove those
> Because the counts are big, this matters a great deal. I'm using EdgeR
> (again, very happy with that software), and the manual cites htseq-count as
> a viable methodology, but doesn't comment on their preferred treatment of
> the unmapped reads.
> My first (somewhat careless) utilization of EdgeR gave us results that
> appeared to make sense, but upon digging a little deeper, I noticed that
> this question affects the p-values quite a lot because the unmapped counts
> are so big.
> I would appreciate any comments/opinions!
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor