[BioC] edgeR normalization factors

Wolfgang Huber whuber at embl.de
Tue Jun 29 17:29:16 CEST 2010


Zhe,

for clustering and similar endeavours, transforming the data to a 
"logarithm-like" variance-stabilised scale is useful. See e.g. chapter 7 
"Sample clustering" of the vignette of the DESeq package.

For differential expression, I agree with Mark that you want to use the 
counts as is, and use the normalization factors as parameters in the 
statistical modeling.

       Wolfgang

On Jun/29/10 10:21 AM, Mark Robinson wrote:
>
> (Travelling so this is a rather quick response)
>
> I disagree with Naomi.
>
> First, for a differential expression analysis, we prefer to use the counts
> as is, and use the normalization factors as offsets in the statistical
> modeling.  So, these normalization factors actually DO NOT change the data
> (this is unlike microarray data normalization).
>
> Second, for clustering, visualization etc. you may want to calculate a
> normalized expression value.  Using the normalization factors that you
> calculate using calcNormFactors() multiplied by the library size (See
> Section 6 of the manual), you could DIVIDE your raw counts by this number
> for each library.  Maybe also multiple by 10M so you have counts per 10M?
>
> I think what Naomi is talking about (highly expressed genes depressing the
> expression of other genes) is covered in our paper:
> http://genomebiology.com/2010/11/3/R25
>
> Cheers,
> Mark
>
>> Multiply.
>>
>> And yes, you should use the normalized data for
>> DE and clustering.  Otherwise, highly expressing
>> genes in your sample will depress the expression
>> of other genes relative to the size of the
>> library, inducing spurious "differential"
>> expression.  I have been simulating data to try to understand this better.
>>
>> --Naomi
>>
>> At 11:19 PM 6/27/2010, 王孆 wrote:
>>> Hello,
>>> Â
>>> I have a question about using TMM normalization
>>> factors. I want to modify the count for each
>>> gene after normalization. Should I just need to
>>> divide the count of each gene by the
>>> normalization factor for its library? Then, I
>>> may use the normalized data for DE
>>> analysis and other further analysis (e.g. clustering).
>>>
>>> Thanks a lot,
>>> Zhe
>>>
>>>
>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> Naomi S. Altman                                814-865-3791 (voice)
>> Associate Professor
>> Dept. of Statistics                              814-863-7114 (fax)
>> Penn State University                         814-865-1348 (Statistics)
>> University Park, PA 16802-2111
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:16}}



More information about the Bioconductor mailing list