# [BioC] Outlier detection in DEseq

Simon Anders anders at embl.de
Wed Oct 20 11:37:53 CEST 2010

```Hi Laurie

On 10/20/2010 07:25 AM, Rui Luo wrote:
>      I have a question regarding to DEseq differential expression analysis.
>      In DEseq, is there any way to detect whether the library from one sample
> is totally screwed up?
>      Or for signal gene, the expression is abnormal in one sample (For this
> situation, do we just abandon this value or modify it)?

if you have enough replicates, you can detect an outlier sample from the
fact that it is markedly different from the rest.

Possible ways to do so:

- Make a heatmap of the samples after performing a variance stabilizing
transformation on the count data. This is decribed in the DESeq
vignette. The heatmap shows you how "different" each sample is from each
other samples, and if one sample is very different from its replicates,
you may want to consider excluding it from analysis.

- Make for each sample an MA plot comparingin it to the "fictive
reference" that I describes in my reply to your other question, as follows

library(DESeq)

# get an example count data set -- or use your data:
cds <- makeExampleCountDataSet()

# estimate the size factors:
cds <- estimateSizeFactors( cds )

# calculate the gene-wise geometric means
geomeans <- exp( rowMeans( log( counts(cds) ) ) )

# choose the sample we ant to check
j <- 1

# plot the log fold change versus the reference against
# the geometric mean
plot( geomeans, counts(cds)[,j] / geomeans, pch='.', log="xy" )

# Mark the size factor (0 log fold change):
abline( h = sizeFactors(cds)[j] )

An odd sample should stick out by looking different. You could also take
the geometric mean not over all samples but only over replicate samples,
or you could simply plot two samples against each other.

Remember that there are also what we call "variance outliers", i.e.,
single genes who vary much more across replicates than the variance fit
would suggest. The vignette tells you how to recognize them.

Simon

```