[BioC] edgeR, very big lib.size makes CPM very small

Vang Quy Le / Region Nordjylland vql at rn.dk
Wed Aug 13 15:37:23 CEST 2014


Hello,
I am working with count table that has very big lib.size:
> dge at .Data[[2]]$lib.size
[1] 3.2e+08 4.2e+08 4.5e+08 3.8e+08 2.3e+08 2.1e+08 3.3e+08 2.8e+08


This causes CPM very small, and consequently very negative logCPM. This is 'head' of my  cpm(counts):

            C1     C2     C3     C4     T1    T2    T3     T4
00000001 0.000 0.0000 0.0000 0.0026 0.0042 0.000 0.000 0.0035
00000002 0.012 0.0092 0.0086 0.0103 0.0042 0.014 0.006 0.0070
00000003 0.073 0.0554 0.0474 0.0620 0.0584 0.056 0.057 0.0525
00000004 0.073 0.0624 0.0496 0.0620 0.0626 0.056 0.060 0.0525
00000005 0.076 0.0624 0.0496 0.0594 0.0584 0.056 0.060 0.0490
00000006 0.067 0.0624 0.0474 0.0620 0.0584 0.046 0.066 0.0630


The point that concerns me here is the effect number of decimal places and rounding of numbers may lose sensitivity. Is this something that can effect the outcome of analysis? If it does, should I just scale the counts up before putting the data through my workflow?  


##### body of 'cpm' function/method #######
{
    x <- as.matrix(x)
    if (is.null(lib.size)) 
        lib.size <- colSums(x)
    if (log) {
        prior.count.scaled <- lib.size/mean(lib.size) * prior.count
        lib.size <- lib.size + 2 * prior.count.scaled
    }
    lib.size <- 1e-06 * lib.size
    if (log) 
        log2(t((t(x) + prior.count.scaled)/lib.size))
    else t(t(x)/lib.size)
}


Kind regards,

Vang Quy Le
Bioinformatician, Molecular Biologist, PhD

+45 97 66 56 29
vql at rn.dk

AALBORG UNIVERSITY HOSPITAL
Section for Molecular Diagnostics,
Clinical Biochemistry
Reberbansgade
DK 9000 Aalborg
www.aalborguh.rn.dk



More information about the Bioconductor mailing list