[BioC] TMM and calcNormFactors: Normalization in baySeq to match edgeR and DESeq

Smith, Hilary A hilary.smith at gatech.edu
Thu Nov 17 16:07:31 CET 2011


Hello,
I'm working on a couple analyses (currently pairwise) for 3'-DGE. Using baySeq, edgeR, and DESeq are yielding different answers; specifically DESeq and baySeq find different subsets of the genes found by edgeR. In trying to isolate the discrepancy, I've been trying to make items like normalization procedures similar to see if that improves congruency, or if the differences merely stem from how the pairwise tests are run and use of bayesian vs. exact-type statistics. I saw that baySeq's function "getLibsizes" can use the edgeR implementation of TMM, but when I try to do this I get an error message about a quantile argument not being used. This error appears whether or not I specify a quantile, and I'm further confused because the edgeR program itself does not require specifying quantiles for its TMM-based calcNormFactors. EdgeR seems to run fine so I think the problem is in the implementation of baySeq; perhaps I'm misunderstanding/coding something? Any help is greatly appreciated; commands excerpted from an R session are below.


> library(baySeq)

Attaching package: 'baySeq'

The following object(s) are masked from 'package:base':

    rbind

> library(snow)
> cl = makeCluster(4, "SOCK")
> library(edgeR)
> simData = read.delim(file="2011.11.03counts.txt", header=TRUE)
> rownames(simData)=simData$CompID
> simData=simData[,-1]
> simData=as.matrix(simData)
> head(simData)
          X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R X2P_F X2P_R X3P_F
comp0      1065  1159  1207  1572  1477  1817  1841   605  1915  1113  1645
comp1       544   534   341   675   333   739   690   236   502   451   571
comp10    30423 37677 28044 54466 23961 58271 53852 34712 59300 40312 44575
comp100    1060  1065   999  1332   918  1620  1697   658  1117   861  1336
comp1000    130   157   229   266   141   247   263   135   182   188   168
comp10000    35    14    15    37    10    47    28    17    22    21    12
          X3P_R
comp0      1732
comp1       799
comp10    51243
comp100    1370
comp1000    244
comp10000    64
> replicates = c("F", "R", "F", "R", "F", "R", "F", "R", "F", "R", "F", "R")
> groups = list(NDE = c(1,1,1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,1,2,1,2,1,2,1,2))
> cD = new("countData", data = simData, replicates = replicates, groups=groups)
> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")        
Calculating library sizes from column totals.
Error in calcNormFactors(d, quantile = quantile, ...) : 
  unused argument(s) (quantile = quantile)
> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="TMM")
Error in match.arg(estimationType) : 
  'arg' should be one of "quantile", "total", "edgeR"
> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR", quantile=0.75)
Calculating library sizes from column totals.
Error in calcNormFactors(d, quantile = quantile, ...) : 
  unused argument(s) (quantile = quantile)
> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, quantile=0.75, estimationType="edgeR")
Calculating library sizes from column totals.
Error in calcNormFactors(d, quantile = quantile, ...) : 
  unused argument(s) (quantile = quantile)
> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType=c("edgeR", quantile=0.75))
Error in match.arg(estimationType) : 'arg' must be of length 1
> calcNormFactors(cD)
Error in calcNormFactors(cD) : 
  calcNormFactors() only operates on 'matrix' and 'DGEList' objects
> calcNormFactors(simData)
    X1E_F     X1E_R     X2E_F     X2E_R     X3E_F     X3E_R     X1P_F     X1P_R 
1.0353157 0.9529524 0.9868063 1.1068479 1.0054938 1.0218195 0.9600905 0.8287707 
    X2P_F     X2P_R     X3P_F     X3P_R 
1.0550414 0.8955669 1.0869486 1.1052472 
> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
Calculating library sizes from column totals.
Error in calcNormFactors(d, quantile = quantile, ...) : 
  unused argument(s) (quantile = quantile)
> cD at libsizes = getLibsizes(data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
Calculating library sizes from column totals.
Error in calcNormFactors(d, quantile = quantile, ...) : 
  unused argument(s) (quantile = quantile)



More information about the Bioconductor mailing list