[BioC] edgeR multifactorial design: confusing BCV plot

Natasha Sahgal nsahgal at well.ox.ac.uk
Tue Sep 11 19:25:57 CEST 2012


Dear List,

This is a follow up from my previous post:
https://stat.ethz.ch/pipermail/bioconductor/2012-July/047173.htmlhttps://stat.ethz.ch/pipermail/bioconductor/2012-July/047173.html

As I finally have the count data, started with the analysis. 

However, I do not understand the output from the BCV plot after estimating dispersion.
Thus would appreciate any help/advice/suggestions.

Also, would appreciate comments on the filtering step! As, it appears to me that I still have some genes with 0 read counts (as seen under the normalisation section).

------------------------------------
Code:
dim(gene.counts.2) # 33607     8
## Sample Descriptions
group = factor(gsub("\\_[[:digit:]]","",colnames(gene.counts.2)))

## Creating dge list
y = DGEList(counts=gene.counts.2, group=group)

## Filtering 
keep = rowSums(cpm(y)>1) >= 4
table(keep)
#FALSE  TRUE 
#17678 15929
keep2 = rowSums(cpm(y)>2) >= 4
table(keep2)
#FALSE  TRUE 
#19300 14307
keep3 = rowSums(cpm(y)>3) >= 4
table(keep3)
#FALSE  TRUE 
#20229 13378

y.filt = y[keep, ] 
dim(y.filt$counts)  # 15929     8

y.filt2 = y[keep2, ] 
dim(y.filt2$counts) # 14307     8

y.filt3 = y[keep3, ] 
dim(y.filt3$counts) # 13378     8

## Recalculate lib.size
y.filt$samples$lib.size = colSums(y.filt$counts)
y.filt2$samples$lib.size = colSums(y.filt2$counts)
y.filt3$samples$lib.size = colSums(y.filt3$counts)

## Normalisation
y.filt = calcNormFactors(y.filt)

range(y.filt$counts[,1]) #  0 159659
range(y.filt$counts[,2]) #  0 155390
range(y.filt$counts[,3]) #  0 122249
range(y.filt$counts[,4]) #  0 137046
range(y.filt$counts[,5]) #  0 206528
range(y.filt$counts[,6]) #  0 222176
range(y.filt$counts[,7]) #  0 192333
range(y.filt$counts[,8]) #  0 229413

y.filt2 = calcNormFactors(y.filt2)

range(y.filt2$counts[,1]) #  0 159659
range(y.filt2$counts[,2]) #  0 155390
range(y.filt2$counts[,3]) #  0 122249
range(y.filt2$counts[,4]) #  0 137046
range(y.filt2$counts[,5]) #  0 206528
range(y.filt2$counts[,6]) #  0 222176
range(y.filt2$counts[,7]) #  0 192333
range(y.filt2$counts[,8]) #  0 229413

y.filt3 = calcNormFactors(y.filt3)

range(y.filt3$counts[,1]) #  0 159659
range(y.filt3$counts[,2]) #  0 155390
range(y.filt3$counts[,3]) #  0 122249
range(y.filt3$counts[,4]) #  0 137046
range(y.filt3$counts[,5]) #  0 206528
range(y.filt3$counts[,6]) #  0 222176
range(y.filt3$counts[,7]) #  0 192333
range(y.filt3$counts[,8]) #  0 229413

## MDS plots
plotMDS(y.filt, main="cpm(y)>1")
plotMDS(y.filt2, main="cpm(y)>2")
plotMDS(y.filt3, main="cpm(y)>3")

## Design Matrix
design = model.matrix(~0+group)
colnames(design) = gsub("group","",colnames(design))
design
#  KO KO_stim WT WT_stim
#1  1       0  0       0
#2  1       0  0       0
#3  0       1  0       0
#4  0       1  0       0
#5  0       0  1       0
#6  0       0  1       0
#7  0       0  0       1
#8  0       0  0       1

## Estimating Dispersion
y.filt = estimateGLMCommonDisp(y.filt, design, verbose=T)
#Disp = 0.0276 , BCV = 0.1661
y.filt2 = estimateGLMCommonDisp(y.filt2, design, verbose=T)
#Disp = 0.02711 , BCV = 0.1646 
y.filt3 = estimateGLMCommonDisp(y.filt3, design, verbose=T)
#Disp = 0.02665 , BCV = 0.1632


y.filt = estimateGLMTrendedDisp(y.filt,design)
y.filt2 = estimateGLMTrendedDisp(y.filt2,design)
y.filt3 = estimateGLMTrendedDisp(y.filt3,design)

y.filt = estimateGLMTagwiseDisp(y.filt,design)
y.filt2 = estimateGLMTagwiseDisp(y.filt2,design)
y.filt3 = estimateGLMTagwiseDisp(y.filt3,design)

jpeg("BCVplots.jpg",height=500,width=900)
par(mfrow=c(1,3))
plotBCV(y.filt, main="cpm(y)>1")
plotBCV(y.filt2, main="cpm(y)>2")
plotBCV(y.filt3, main="cpm(y)>3")
dev.off()

### NOT RUN this section fully
## Fit Model
fit = glmFit(y.filt, design)
lrt = glmLRT(y.filt, fit, contrast=(KO_stim - KO) - (WT_stim - WT)) ### this does not work on testing, so I think I have not correctly defined the contrast parameter

------------------------------------

sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] scatterplot3d_0.3-33 DESeq_1.8.3          locfit_1.5-7        
[4] Biobase_2.16.0       BiocGenerics_0.2.0   WriteXLS_2.1.0      
[7] edgeR_2.6.10         limma_3.12.0        

loaded via a namespace (and not attached):
 [1] annotate_1.34.0      AnnotationDbi_1.18.0 DBI_0.2-5           
 [4] genefilter_1.38.0    geneplotter_1.34.0   grid_2.15.0         
 [7] IRanges_1.14.2       lattice_0.20-6       RColorBrewer_1.0-5  
[10] RSQLite_0.11.1       stats4_2.15.0        survival_2.36-14    
[13] tools_2.15.0         xtable_1.7-0        
------------------------------------

Many Thanks,
Natasha



More information about the Bioconductor mailing list