[BioC] edgeR normalization factors

Mark Robinson mrobinson at wehi.EDU.AU
Tue Jun 29 10:21:01 CEST 2010

(Travelling so this is a rather quick response)

I disagree with Naomi.

First, for a differential expression analysis, we prefer to use the counts
as is, and use the normalization factors as offsets in the statistical
modeling.  So, these normalization factors actually DO NOT change the data
(this is unlike microarray data normalization).

Second, for clustering, visualization etc. you may want to calculate a
normalized expression value.  Using the normalization factors that you
calculate using calcNormFactors() multiplied by the library size (See
Section 6 of the manual), you could DIVIDE your raw counts by this number
for each library.  Maybe also multiple by 10M so you have counts per 10M?

I think what Naomi is talking about (highly expressed genes depressing the
expression of other genes) is covered in our paper:


> Multiply.
> And yes, you should use the normalized data for
> DE and clustering.  Otherwise, highly expressing
> genes in your sample will depress the expression
> of other genes relative to the size of the
> library, inducing spurious "differential"
> expression.  I have been simulating data to try to understand this better.
> --Naomi
> At 11:19 PM 6/27/2010, 王孆 wrote:
>>I have a question about using TMM normalization
>>factors. I want to modify the count for each
>>gene after normalization. Should I just need to
>>divide the count of each gene by the
>>normalization factor for its library? Then, I
>>may use the normalized data for DE
>>analysis and other further analysis (e.g. clustering).
>>Thanks a lot,
>>         [[alternative HTML version deleted]]
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>Search the archives:
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list