[BioC] edgeR Warings

Mark Robinson mark.robinson at imls.uzh.ch
Thu Mar 14 17:15:52 CET 2013


Hi Tongwu,

Right, so that is an absurdly high Dispersion/BCV from estimateCommonDisp().

> So, For my data (just have two groups), can I use the estimateGLMCommonDisplay to estimate the dispersion and then use the exactTest to find the DEGs?

Short answer is: I probably wouldn't do that.  At least not yet.

Let's back up a minute and figure out what is going on -- I guess what is not clear from your code is what your grouping (you say two groups) actually gives.  Specifically, what does d$samples look like (i.e. after specifying group=target$Sensitivity[1:17]) ?

I notice that you've called estimateGLMCommonDisp() with no design matrix.  This means it would treat all 17 samples as replicates and calculate dispersion as if this were a single group.  This is probably not what you want.

In general, using estimateGLMCommonDisp() with no design matrix in a 2-group situation would tend to overestimate the dispersion, but here it is the opposite, at least relative to estimateCommonDisp() -- I wouldn't expect qCML and Cox-Reid dispersion estimates to be the same, but they should be similar.  So, that's strange.  I wonder if there are some super highly variable features that kills the qCML estimation?  I have not experienced this before, but have you done other checks like an MDS plot ?  Looking for highly variable features ?

Best regards, Mark


On 14.03.2013, at 16:06, Tongwu [guest] <guest at bioconductor.org> wrote:

> 
> Recently, I use edgeR in R to find DEG, When I loaded my data and estimated the dispersion with estimateCommonDisp, there are always some warnings for my data:
> estimateCommonDisp(y,verbose=TRUE)
> Disp = 99.99477 , BCV = 9.9997
> There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
> Warning messages:
> 1: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> 2: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> 3: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> 4: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> 5: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> 6: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> 7: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> 8: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> 9: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma'
> ……..
> But when I change to estiamteGLMCommonDisp, everything looks good, and here is the results:
> estimateGLMCommonDisp(y,verbose=TRUE)
> Disp = 0.69044 , BCV = 0.8309
> So, For my data (just have two groups), can I use the estimateGLMCommonDisplay to estimate the dispersion and then use the exactTest to find the DEGs?
> Thanks and looking forward to your reply.
> Tongwu
> Here is my scripts in R:
> setwd("~/Work/Axeq_RNA_seq/DEG/edgeR/")
> x<-read.delim("../All_htseqCounts.txt2",head=T,check.names=F,row.names=1)
> head(x)
> target<-read.table("../design_type.list",head=T)
> library(edgeR)
> ## For R and S ###
> head(x[,1:17])
> y=DGEList(counts=x[,1:17], group=target$Sensitivity[1:17],)
> dim(y)
> ## Filtering
> keep<-rowSums(cpm(y)>1) >=5
> y<-y[keep,]
> dim(y)
> ## Re-compute the library sizes
> y$samples$lib.size <- colSums(y$counts)
> ## Normalizing
> y<-calcNormFactors(y)
> y$samples
> ## Data exploration
> plotMDS(y)
> ## Estimating the dispersion
> y<-estimateGLMCommonDisp(y,verbose=TRUE)
> ##Disp = 0.69044 , BCV = 0.8309
> y <- estimateGLMTrendedDisp(y)
> y <- estimateGLMTagwiseDisp(y)
> plotBCV(y)
> ## Differential expression
> et<-exactTest(y)
> top<- topTags(et,n=100)
> top
> cpm(y)[rownames(top),]
> ## total number of DE genes at 5% FDR is given by ##  S/R
> summary(de<-decideTestsDGE(et,adjust.method="BH",p.value=0.05))
> ##[,1]
> ##-1   310 (up-regulaiton in R)
> ##0  12881
> ##1    457
> detags<-rownames(y)[as.logical(de)]
> plotSmear(et,de.tags=detags)
> abline(h=c(-1,1),col="blue")
> 
> -- output of sessionInfo(): 
> 
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] edgeR_3.0.6  limma_3.14.3
> 
> loaded via a namespace (and not attached):
> [1] tools_2.15.2
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

----------
Prof. Dr. Mark Robinson
Bioinformatics, Institute of Molecular Life Sciences
University of Zurich
http://tiny.cc/mrobin



More information about the Bioconductor mailing list