[BioC] The difference between three methods in calcNormFacotors() in edgeR

Zhan Tianyu sewen67 at gmail.com
Fri Jul 4 15:32:07 CEST 2014


Hello all,

      I have a question concerning the calcNormFacotrs() in edgeR. There
are three methods that I could choose from: "TMM", "RLE", and
"upperquartile". I am wondering how could decide which one to use?

      For example, consider a simple example like this: there are 10 genes
in total, and 4 genes in two groups. Therefore, the counts data would be a
10*8 matrix, where each row is the gene, each column is the individual, and
the 1-4 columns are the first group, 5-8 columns are the second group.
Among the 10 genes, 60% genes are the differential genes: the counts of No.
3,4,5,6,8,9 in the first group are doubled, while others are the sample.
Please see the attachments for this count data.

      Then I generated the "group" factor via this command:
      > grp <- as.factor(rep(0:1, each = 8/2))

      After that, I generated the DGEList by:
      > d <- DGEList(counts = counts, group = grp )

       Then I calculated the normalization factor by edgeR:
      >  n <- calcNormFactors(d)

       By default, this function uses the "TMM" method. However, the
normalization factors look like this:

group               lib.size             norm.factors

Sample1     0  5062446        1.1195829383593

Sample2     0  5062340        0.8154739771400

Sample3     0  5062444        1.1195827474525

Sample4     0  5062466        1.1403164060313

Sample5     1  3000123        0.9624162935534

Sample6     1  2999992        0.9624163157255

Sample7     1  2999977        0.9624169648716
Sample8     1  3000156        0.9624160077253

        I think it is weird, because normalization factors for individuals
1 and 2 are quite different (1.11958, and 0.81547). However, from the
counts data, their counts are generally the same (Please see the attachment
for counts data).

        Then I tried the method of RLE method:
        n <- calcNormFactors(d,method="RLE")

         The results are:

$samples

        group   lib.size             norm.factors

Sample1     0  5062446         1.0886765699045

Sample2     0  5062340         1.0886508565338

Sample3     0  5062444         1.0886766741626

Sample4     0  5062466         1.0886750099086

Sample5     1  3000123         0.9185446848068

Sample6     1  2999992         0.9185578680804

Sample7     1  2999977         0.9185624609049

Sample8     1  3000156           0.9185437155777

          I think this time the results are more reasonable. My question is
how I decide which method to use? Why TMM gives a weird result?

         Thank you.


Best regards,

sewen67


More information about the Bioconductor mailing list