[BioC] extracting CPM from a DGElist after normalization in edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Fri Apr 4 01:57:36 CEST 2014


Dear Alessandro,

I see that Devon Ryan has answered your question, but the answer is also 
available directly from the help system.  If you type help("cpm") the 
first line of Details says:

"CPM or RPKM values are useful descriptive measures for the expression 
level of a gene or transcript. By default, the normalized library sizes 
are used in the computation for DGEList objects but simple column sums for 
matrices."

Best wishes
Gordon

> Date: Thu, 03 Apr 2014 11:58:57 +0200
> From: "alessandro.guffanti at genomnia.com"
> 	<alessandro.guffanti at genomnia.com>
> To: Bioconductor mailing list <bioconductor at r-project.org>
> Subject: [BioC] extracting CPM from a DGElist after normalization in
> 	edgeR
>
> Dear colleagues and dear Mark:
>
> We want to extract the CPM from a DGElist object after TMM
> normalization, common and tagwise dispersion evaluation.
>
> Here is a sample of the code, it's just a standard edgeR DEG workflow:
>
> samplesheet1 <- read.delim("sample_sheet.txt", stringsAsFactors = FALSE)
> comparison <- c("Tumor_1","Tumor_2")
> RG1 <- readDGE(samplesheet1,header=FALSE)
> colnames(RG1) <- gsub(".mature_cnt.txt","",RG1$sample$files,perl=TRUE)
> N<-length(colnames(RG1))/2
> K<-10
> keep <- rowSums(cpm(RG1) > K) >= N
> difflist1 <- RG1[keep,]
> difflist1$samples$lib.size <- colSums(difflist1$counts)
> difflist1 <- calcNormFactors(difflist1)
> difflist1 <- estimateCommonDisp(difflist1)
> difflist1 <- estimateTagwiseDisp(difflist1)
> currentDiff <- difflist1
> de.com <- exactTest(currentDiff,pair=comparison)
>
> ... then there is the creation of an Excel output dataframes, including
> the CPM worksheet:
>
> addDataFrame(*round(cpm(currentDiff)*[,currentDiff$samples$group==comparison[1]
> | currentDiff$samples$group==comparison[2]]), ucpm_sheet, startRow=2,
> startColumn=1)
>
> Now, the CPM content leaves us quite puzzled because it is very variable
> according to the syntax we use
> to extract it, and this particular syntax seems to give values which do
> not sum up to 1 million as expected.
> This is the syntax coherent with the
>
> Apparently the only CPM values which do sum up correctly to 1 million es
> expected are those extracted with the following
> instruction
>
> head(cpm(currentDiff$counts))
>
> But what are then these CPM values, which are different from the ones
> extracted with the command above ?
>
> head(cpm(currentDiff))
>
>
> These CPM values are different also from the values obtained with this
> command
>
>
> head(cpm(currentDiff$pseudo.counts))
>
>
> Many thanks in advance for any clarification !
>
> Alessandro, Moreno & Paolo
>
> ----
>
>
>>  head(cpm(currentDiff))
>
> LT1C     LT2C   LT3C   ST2C    ST4C   ST10C   ST12C ST5C    ST7C
> ST8C   ST11C   LT1P    LT2P
>
> hsa-miR-140-5p 306.550 209.4272 233.03 273.53 581.762 263.281 140.452
> 285.775 633.920 679.240 232.223 396.27 203.828
>
> hsa-miR-133a-3p 28.945  14.5648  51.61  44.04   4.116   5.495  33.441
> 55.981   4.393  18.394   2.609  29.45   2.474
>
> hsa-miR-665 11.695  21.7297  36.35  16.71   4.785  18.865  46.817 3.762
> 12.213  19.308  20.162   3.22  32.849
>
> hsa-miR-1250-5p 2.047   0.1175  13.03  10.21   3.344  13.462   3.541
> 6.396   2.357   4.166   3.202  28.35   4.123
>
> hsa-miR-381-3p 19.223  24.1963  84.53  44.74  16.104  10.898  33.834
> 182.992  43.926  37.092   1.186  25.37  10.308
>
> hsa-miR-374b-3p 39.543   7.6348  10.46  27.62  31.077  18.224  25.966
> 17.457  39.747  29.572   7.353  22.46  13.607
>
> LT3P   ST2P   ST10P  ST12P   ST7P    ST8P    ST4P   ST5P ST11P     SB
>
> hsa-miR-140-5p 401.843 309.98 408.091 231.87 469.49 664.349 213.087
> 551.31 870.579 326.37
>
> hsa-miR-133a-3p 2.974  16.17   5.822  14.44  43.60  48.011  27.482
> 27.06 13.031  41.21
>
> hsa-miR-665 15.179  46.63  18.831  15.42   4.69   3.359   7.923  11.81
> 3.425   4.57
>
> hsa-miR-1250-5p 10.904  12.05  18.831  19.21  27.76   1.976  21.870
> 48.22 67.411  12.49
>
> hsa-miR-381-3p 11.957  44.73  10.007  16.40  21.45  23.907  80.795
> 21.46 32.661  75.00
>
> hsa-miR-374b-3p 30.048  30.12  15.829  21.17  23.99  46.924  13.122
> 28.86 36.587  19.18
>
>
> > head(cpm(currentDiff$counts))
>
> LT1C     LT2C   LT3C   ST2C    ST4C   ST10C   ST12C ST5C    ST7C
> ST8C    ST11C    LT1P    LT2P
>
> hsa-miR-140-5p 215.684 196.8196 237.43 307.42 605.485 285.767 142.576
> 288.156 546.467 593.42 153.0286 458.283 167.040
>
> hsa-miR-133a-3p 20.365  13.6880  52.59  49.49   4.284   5.964  33.947
> 56.448   3.787  16.07   1.7194  34.058   2.027
>
> hsa-miR-665 8.228  20.4215  37.04  18.78   4.980  20.476  47.525 3.794
> 10.529  16.87  13.2864   3.724  26.920
>
> hsa-miR-1250-5p 1.440   0.1104  13.28  11.47   3.481  14.611   3.594
> 6.449   2.032   3.64   2.1102  32.786   3.379
>
> hsa-miR-381-3p 13.525  22.7397  86.13  50.28  16.761  11.828  34.346
> 184.517  37.866  32.41   0.7816  29.335   8.448
>
> hsa-miR-374b-3p 27.822   7.1751  10.66  31.05  32.344  19.780  26.359
> 17.602  34.264  25.84   4.8456  25.975  11.151
>
> LT3P   ST2P   ST10P  ST12P    ST7P    ST8P   ST4P   ST5P ST11P      SB
>
> hsa-miR-140-5p 418.080 330.59 487.258 231.94 532.639 720.513 260.36
> 599.79 929.610 342.404
>
> hsa-miR-133a-3p 3.094  17.25   6.952  14.44  49.461  52.070  33.58
> 29.44 13.915  43.239
>
> hsa-miR-665 15.792  49.73  22.484  15.42   5.321   3.643   9.68  12.85
> 3.657   4.795
>
> hsa-miR-1250-5p 11.345  12.85  22.484  19.22  31.491   2.143  26.72
> 52.46 71.982  13.100
>
> hsa-miR-381-3p 12.441  47.70  11.948  16.40  24.338  25.928  98.72
> 23.34 34.876  78.687
>
> hsa-miR-374b-3p 31.263  32.12  18.899  21.17  27.216  50.891  16.03
> 31.40 39.068  20.121
>
>
> > head(cpm(currentDiff$pseudo.counts))
>
> LT1C    LT2C   LT3C   ST2C    ST4C   ST10C  ST12C    ST5C ST7C
> ST8C    ST11C    LT1P    LT2P
>
> hsa-miR-140-5p 215.686 196.824 237.42 307.42 605.506 285.765 142.74
> 288.149 546.461 593.414 153.0351 458.284 167.054
>
> hsa-miR-133a-3p 20.362  13.698  52.59  49.49   4.256   5.961  33.80
> 56.483   3.793  16.071   1.7286  34.088   2.048
>
> hsa-miR-665 8.221  20.425  37.04  18.79   4.953  20.476  47.03   3.781
> 10.532  16.869  13.2858   3.712  26.891
>
> hsa-miR-1250-5p 1.432   0.121  13.29  11.47   3.461  14.617   3.71
> 6.449 2.037   3.642   2.1169  32.804   3.399
>
> hsa-miR-381-3p 13.517  22.749  86.14  50.28  16.735  11.826  34.44
> 184.584 37.867  32.407   0.7897  29.345   8.461
>
> hsa-miR-374b-3p 27.838   7.186  10.65  31.05  32.352  19.779  26.37
> 17.595 34.259  25.835   4.8545  25.973  11.165
>
> LT3P   ST2P   ST10P  ST12P    ST7P    ST8P    ST4P   ST5P ST11P      SB
>
> hsa-miR-140-5p 418.084 330.60 487.258 231.96 532.637 720.512 260.347
> 599.80 929.622 342.403
>
> hsa-miR-133a-3p 3.073  17.26   6.948  14.45  49.475  52.067  33.578
> 29.45 13.909  43.238
>
> hsa-miR-665 15.786  49.68  22.485  15.42   5.309   3.645   9.673 12.85
> 3.649   4.794
>
> hsa-miR-1250-5p 11.334  12.86  22.482  19.23  31.489   2.145  26.717
> 52.48 72.001  13.099
>
> hsa-miR-381-3p 12.430  47.70  11.945  16.41  24.330  25.929  98.743
> 23.34 34.875  78.686
>
> hsa-miR-374b-3p 31.272  32.12  18.897  21.18  27.212  50.889  16.025
> 31.40 39.072  20.120
>
>
> > colSums(cpm(currentDiff))
>
> LT1C    LT2C    LT3C    ST2C    ST4C   ST10C   ST12C ST5C    ST7C
> ST8C   ST11C    LT1P    LT2P    LT3P    ST2P
>
> 1421292 1064057  981465  889765  960819  921314  985099  991736 1160034
> 1144623 1517511  864691 1220229  961164  937648
>
>   ST10P ST12P    ST7P    ST8P    ST4P    ST5P   ST11P      SB
>
>  837525 999688  881438  922050  818447  919173  936499  953165
>
>
> > colSums(cpm(currentDiff$counts))
>
> LT1C  LT2C  LT3C  ST2C  ST4C ST10C ST12C  ST5C  ST7C  ST8C ST11C  LT1P
> LT2P  LT3P  ST2P ST10P ST12P  ST7P  ST8P  ST4P
>
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
>
>  ST5P ST11P    SB
>
> 1e+06 1e+06 1e+06
>
>
> > colSums(cpm(currentDiff$pseudo.counts))
>
> LT1C  LT2C  LT3C  ST2C  ST4C ST10C ST12C  ST5C  ST7C  ST8C ST11C  LT1P
> LT2P  LT3P  ST2P ST10P ST12P  ST7P  ST8P  ST4P
>
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
>
>  ST5P ST11P    SB
>
> 1e+06 1e+06 1e+06
>
>
>
>>  sessionInfo()
>
> R version 3.0.2 (2013-09-25)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
>
> [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
> LC_MONETARY=Italian_Italy.1252
>
> [4] LC_NUMERIC=C                   LC_TIME=Italian_Italy.1252
>
> attached base packages:
>
> [1] stats     graphics  grDevices utils     datasets  methods base
>
> other attached packages:
>
> [1] xlsx_0.5.5     xlsxjars_0.5.0 rJava_0.9-6    edgeR_3.4.2 limma_3.18.9
>
> loaded via a namespace (and not attached):
>
> [1] tools_3.0.2

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list