[BioC] Normalization by DEseq

Wolfgang Huber whuber at embl.de
Tue Oct 19 15:10:36 CEST 2010

Dear Laurie

Normalisation: Briefly, the normalisation works as follows: if k_ij is 
the count of the i-th gene (or in your case, I guess, taxon) in the j-th 
sample, then we compute f_i as the geometric mean of these values across 
samples. The normalised count is k_ij / f_i.

In more detail, it is described in the paper "Differential expression 
analysis for sequence count data", a preprint is available at Nature 
Precedings, (4282), 2010, the full publication will come out in Genome 

Zero counts: The statistical model of DESeq includes situations in which 
the counts are zero in one group and non-zero in others, so I would 
recommend leaving these taxa in the data, because you will benefit from 
getting proper statistical inference for these cases, too.
(Normalisation should, afaIcs, not significantly be affected, unless 
there is some really odd asymmetry in your data.)

  Best wishes

Il Oct/19/10 6:56 AM, Rui Luo ha scritto:
> Dear DEseq developers,
>          I have a few questions related to the normalization step in DEseq.
>          It is stated that it will normalize the raw counts by library size,
> but how the mathmatical idea is? would you mind giving a more detailed
> explanation?
>          Now I have two groups of metatranscriptome data, one group contain
> H.pylori, the other not. For sure, I have some transcripts in the first
> group that are from H.pylori but not is in group two.
>          I am wondering if I want to do differential expression analysis for
> these two groups, should I filter out the group specific transcripts before
> putting into DEseq? Will this affect the normalization step?
> Thanks!
> best,
> Laurie

More information about the Bioconductor mailing list