[BioC] edgeR, very big lib.size makes CPM very small

Gordon K Smyth smyth at wehi.EDU.AU
Fri Aug 15 03:04:09 CEST 2014


> Date: Wed, 13 Aug 2014 13:37:23 +0000
> From: Vang Quy Le / Region Nordjylland <vql at rn.dk>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] edgeR, very big lib.size makes CPM very small
>
> Hello,
> I am working with count table that has very big lib.size:
>> dge at .Data[[2]]$lib.size
> [1] 3.2e+08 4.2e+08 4.5e+08 3.8e+08 2.3e+08 2.1e+08 3.3e+08 2.8e+08
>
>
> This causes CPM very small, and consequently very negative logCPM. This is 'head' of my  cpm(counts):
>
>            C1     C2     C3     C4     T1    T2    T3     T4
> 00000001 0.000 0.0000 0.0000 0.0026 0.0042 0.000 0.000 0.0035
> 00000002 0.012 0.0092 0.0086 0.0103 0.0042 0.014 0.006 0.0070
> 00000003 0.073 0.0554 0.0474 0.0620 0.0584 0.056 0.057 0.0525
> 00000004 0.073 0.0624 0.0496 0.0620 0.0626 0.056 0.060 0.0525
> 00000005 0.076 0.0624 0.0496 0.0594 0.0584 0.056 0.060 0.0490
> 00000006 0.067 0.0624 0.0474 0.0620 0.0584 0.046 0.066 0.0630
>
>
> The point that concerns me here is the effect number of decimal places 
> and rounding of numbers may lose sensitivity.

No, not unless you are planning to run R on a 1960's calculator without 
floating point arithmetic.

> Is this something that can effect the outcome of analysis?

No.  Modern computers with floating point arithmetic have no trouble with 
trivial issues like this.

Floating point arithmetic means that numbers are not rounded to any fixed 
number of decimal places.  Rather, all numbers are stored to the same 
number of significant figures regardless of their absolute size.

> If it does, should I just scale the counts up before putting the data 
> through my workflow?

No, you should not falsify the true nature of your data to edgeR.

Gordon

> ##### body of 'cpm' function/method #######
> {
>    x <- as.matrix(x)
>    if (is.null(lib.size))
>        lib.size <- colSums(x)
>    if (log) {
>        prior.count.scaled <- lib.size/mean(lib.size) * prior.count
>        lib.size <- lib.size + 2 * prior.count.scaled
>    }
>    lib.size <- 1e-06 * lib.size
>    if (log)
>        log2(t((t(x) + prior.count.scaled)/lib.size))
>    else t(t(x)/lib.size)
> }
>
>
> Kind regards,
>
> Vang Quy Le
> Bioinformatician, Molecular Biologist, PhD
>
> +45 97 66 56 29
> vql at rn.dk
>
> AALBORG UNIVERSITY HOSPITAL
> Section for Molecular Diagnostics,
> Clinical Biochemistry
> Reberbansgade
> DK 9000 Aalborg
> www.aalborguh.rn.dk

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list