[BioC] [EDGER] Normalization issue

James W. MacDonald jmacdon at uw.edu
Fri Jul 20 16:15:16 CEST 2012


Hi Francois,

On 7/20/2012 9:57 AM, François RICHARD wrote:
> Dear all,
>
> I am a master student in France, working on RNA-seq data.
> I am trying to go through a differential gene expression analysis
> using EdgeR and starting with 2 conditions * 2 replicates = 4 runs
> (illumina, mapped with bowtie on known reference genome). I have few
> questions about the normalization of the dataset.
>
> As I understood, the normalization is needed to correct the library
> size between each samples. It is given by the TMM method, calling the
> calcNormFactors() function.

No, the calcNormFactors() function is used to account for 'RNA 
composition', not library size. See section 2.3.3 in the edgeR User's guide.

> This give a normalization factor that will correspond to an offset in
> the model that will test for differential expressed genes.
>
> The function estimateCommonDisp() give the dispersion and the
> exactTest() run the differential analysis (performing negative
> binomial test). But according to the edgeR manual, those two functions
> called the equalizeLibSizes() function in order to generate pseudo
> counts (which corrected the library size as well).

Right. The library size is automatically corrected. You _may_ need to 
use calcNormFactors() to account for situations where technical effects 
can bias your results. Two examples are given in 2.3.3 of the edgeR 
user's guide.

Best,

Jim


>
> What I do not understand here is that the library size should be
> already corrected by the TMM method.
>
> My question is, finally :
> What is the difference between the calcNormFactors() and
> equalizeLibSizes()? Does the pseudo-counts generated by
> equalizeLibSizes() are taking care of the normalization factor?
>
> I hope I have been clear enough, and that you will be able to help me,
>
> Thanks a lot,
>
> François
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list