[BioC] Outlier detection in DEseq

Simon Anders anders at embl.de
Wed Oct 20 11:37:53 CEST 2010


Hi Laurie

On 10/20/2010 07:25 AM, Rui Luo wrote:
>      I have a question regarding to DEseq differential expression analysis.
>      In DEseq, is there any way to detect whether the library from one sample
> is totally screwed up?
>      Or for signal gene, the expression is abnormal in one sample (For this
> situation, do we just abandon this value or modify it)?

if you have enough replicates, you can detect an outlier sample from the 
fact that it is markedly different from the rest.

Possible ways to do so:

- Make a heatmap of the samples after performing a variance stabilizing 
transformation on the count data. This is decribed in the DESeq 
vignette. The heatmap shows you how "different" each sample is from each 
other samples, and if one sample is very different from its replicates, 
you may want to consider excluding it from analysis.

- Make for each sample an MA plot comparingin it to the "fictive 
reference" that I describes in my reply to your other question, as follows

   library(DESeq)

   # get an example count data set -- or use your data:
   cds <- makeExampleCountDataSet()

   # estimate the size factors:
   cds <- estimateSizeFactors( cds )

   # calculate the gene-wise geometric means
   geomeans <- exp( rowMeans( log( counts(cds) ) ) )

   # choose the sample we ant to check
   j <- 1

   # plot the log fold change versus the reference against
   # the geometric mean
   plot( geomeans, counts(cds)[,j] / geomeans, pch='.', log="xy" )

   # Mark the size factor (0 log fold change):
   abline( h = sizeFactors(cds)[j] )

An odd sample should stick out by looking different. You could also take 
the geometric mean not over all samples but only over replicate samples, 
or you could simply plot two samples against each other.


Remember that there are also what we call "variance outliers", i.e., 
single genes who vary much more across replicates than the variance fit 
would suggest. The vignette tells you how to recognize them.


   Simon



More information about the Bioconductor mailing list