[BioC] edgeR : Include or exclude structural/noncoding RNA reads in the analysis

Gordon K Smyth smyth at wehi.EDU.AU
Sun Apr 14 06:13:40 CEST 2013


Dear Gowthaman,

Personally, I would remove the rRNA genes, recompute the library sizes 
based on the remaining counts, then scale normalize with 
calcNormFactors().

However, edgeR will be fairly robust against whether or not the rRNA genes 
are kept in or how much of the library they consume provided that 
calcNormFactors() is used.

Best wishes
Gordon

> Date: Fri, 12 Apr 2013 08:15:35 -0700
> From: gowtham <ragowthaman at gmail.com>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] edgeR : Include or exclude structural/noncoding RNA
> 	reads in	the analysis
>
> Hi Everyone,
> I have been using edgeR for quite sometime. Most of our RNAseq data comes
> from infectious organisms like Malaria and Tryps. Our libraries generally
> have 10 to 20% of the reads coming from rRNA genes (not sure if this is the
> typical value for other organisms/protocols). All these days, I have been
> ignoring them while doing the DE analysis using edgeR.
>
> I am NOT interested in differential expression of rRNA genes, but, worrying
> that excluding them from edgeR might bias the library size calculations. On
> the other hand, including them might introduce bunch of outliers (these
> rRNA genes have very high read counts). I could not intuitively decide one
> over other. So, asking for a help from experts.
>
> Does this change if libraries have varying amount of rRNA contamination.
> Say, one set of libraries have 20% rRNA and another has 40%.
>
> Thanks a bunch in advance,
> Gowthaman
>
>
> -- 
> Gowthaman
>
> Bioinformatics Systems Programmer.
> SBRI, 307 West lake Ave N Suite 500
> Seattle, WA. 98109-5219
> Phone : LAB 206-256-7188 (direct).
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list