[BioC] how edgeR control the outliers?

Fri Jan 27 08:55:09 CET 2012

hi,

On Thu, 2012-01-26 at 21:19 -0800, Yuan Tian wrote:
> Dear all,
> 
> I use edgeR for differential expression analysis on a RNAseq dataset. But I found
> that edgeR is very sensitive to outlier samples. For example, for one gene, overall
> the expression pattern is similar between control group and experimental group, but
> there is one single sample which behaves very differently from the others, then this
> gene is very likely to be falsely detected as differentially expressed. So can anyone
> please tell me if there's any option in the algorithm that can control the outlier impact?
> 
> I'm thinking to use median read count value instead of mean read count value to fit the
> NB distribution, and to estimate the dispersions. Just wondering if there's an option
> available in edgeR? Or is there any other RNAseq DE analysis package which is less
> sensitive to outliers?

i think what you're referring to is illustrated in figure 2 of the
vignette of the tweeDEseq package whose underlying statistical model can
address this kind of situation.

> The outlier sample might be different when you look at different genes, so we can't take
> the whole sample out in the analysis.

there might be a number of reasons by which "outlier" count values show
up but a sensible one is just biological variability (Hansen et al. Nat.
Biotech., 29:572-573, 2011, doi:10.1038/nbt.1910), thus not only you
cannot take the sample out, but that count value in that sample might be
true biology. if your experimental conditions convey lots of biological
variability you may need to work with more biological replicates.

cheers,

robert.