[BioC] EdgeR GOF non-linear

Thomas Frederick Willems twillems at mit.edu
Fri Oct 19 17:47:20 CEST 2012

Hi Gordon,

I've been trying to use edgeR to analyze some mRNA-seq data. I've begun looking at a data set where the cells are either unstimulated or stimulated with one single treatment and I have three replicates for each class. When I use the GLM framework with tagwise dispersion to fit a model for the effects of the single treatment and plot the GOF, it is distinctly non-linear. When I remove one replicate from each class by looking at the replicate most distant from its counterparts in the plotMDS graph, I obtain a plot that appears more linear but still deviates from the diagonal.
I've attached these gof plot as well as the results from plotBCV for the situation in which 3 replicates were used. I have a few questions regarding this plot:
1) Do the combination of gof plots suggest that there are gross batch differences between the replicates, or have you seen other RNA-seq datasets with a fit similar to the first gof plot?
2) Is the fit sufficiently poor that any conclusions drawn from the GLM coefficients are likely of little use?
3) In another dataset I'm analyzing, the line also lies below the y=x plot but is linear and has the same slope. What sort of issue might this suggest?

My sequencing depths for the samples, as given by samples.libsize, are:
26693672 61963337 64385230 65093734 79193403 57877925

My code is as follows:

# File containing file names where sample counts can be read
targets <- read.delim("/home/twillems/Desktop/IL6-paired-end/GLM/files.txt", stringsAsFactors = FALSE)

treatment <- c(0,0,0,1,1,1)
design <- model.matrix(~treatment)

y <- readDGE(targets)
colnames(y) <- y$samples[['description']]
min_counts_per_million <- 0.2
min_samples <- 3
y <- y[rowSums(cpm(y) > min_counts_per_million) >= nrow(y$samples), ]
y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y)

y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y,design)


fit1 <- glmFit(y, design)
g <- gof(fit1, plot=TRUE)

And the result of sessionInfo():
R version 2.14.1 (2011-12-22)
Platform: i686-pc-linux-gnu (32-bit)

 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] edgeR_2.6.12 limma_3.10.3 rkward_0.5.6

loaded via a namespace (and not attached):
[1] tools_2.14.1

Thanks for your help

Thomas Willems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gof plot_all_3_rep.PNG
Type: image/png
Size: 36903 bytes
Desc: gof plot_all_3_rep.PNG
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20121019/9d926986/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BCV plot_all_3_rep.PNG
Type: image/png
Size: 31803 bytes
Desc: BCV plot_all_3_rep.PNG
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20121019/9d926986/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gof plot_2_rep.PNG
Type: image/png
Size: 35610 bytes
Desc: gof plot_2_rep.PNG
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20121019/9d926986/attachment-0002.png>

More information about the Bioconductor mailing list