[BioC] edgeR normalization factors

Naomi Altman naomi at stat.psu.edu
Tue Jun 29 17:20:44 CEST 2010


Of course Mark is correct for DE analysis.  What 
I should have said is that the normalized Library 
Size should be used for DE.  And this is certainly covered in the paper.

For clustering, I think you probably will need to 
change the data - but it depends on what you are 
clustering and the distance measure.

--Naomi

At 04:21 AM 6/29/2010, Mark Robinson wrote:

>(Travelling so this is a rather quick response)
>
>I disagree with Naomi.
>
>First, for a differential expression analysis, we prefer to use the counts
>as is, and use the normalization factors as offsets in the statistical
>modeling.  So, these normalization factors actually DO NOT change the data
>(this is unlike microarray data normalization).
>
>Second, for clustering, visualization etc. you may want to calculate a
>normalized expression value.  Using the normalization factors that you
>calculate using calcNormFactors() multiplied by the library size (See
>Section 6 of the manual), you could DIVIDE your raw counts by this number
>for each library.  Maybe also multiple by 10M so you have counts per 10M?
>
>I think what Naomi is talking about (highly expressed genes depressing the
>expression of other genes) is covered in our paper:
>http://genomebiology.com/2010/11/3/R25
>
>Cheers,
>Mark
>
> > Multiply.
> >
> > And yes, you should use the normalized data for
> > DE and clustering.  Otherwise, highly expressing
> > genes in your sample will depress the expression
> > of other genes relative to the size of the
> > library, inducing spurious "differential"
> > expression.  I have been simulating data to try to understand this better.
> >
> > --Naomi
> >
> > At 11:19 PM 6/27/2010, 王孆 wrote:
> >>Hello,
> >>Â
> >>I have a question about using TMM normalization
> >>factors. I want to modify the count for each
> >>gene after normalization. Should I just need to
> >>divide the count of each gene by the
> >>normalization factor for its library? Then, I
> >>may use the normalized data for DE
> >>analysis and other further analysis (e.g. clustering).
> >>
> >>Thanks a lot,
> >>Zhe
> >>
> >>
> >>
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >>_______________________________________________
> >>Bioconductor mailing list
> >>Bioconductor at stat.math.ethz.ch
> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>Search the archives:
> >>http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > Naomi S. Altman                                814-865-3791 (voice)
> > Associate Professor
> > Dept. of Statistics                              814-863-7114 (fax)
> > Penn State University                         814-865-1348 (Statistics)
> > University Park, PA 16802-2111
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>
>
>______________________________________________________________________
>The information in this email is confidential 
>and intended solely for the addressee.
>You must not disclose, forward, print or use it 
>without the permission of the sender.
>______________________________________________________________________

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list