[BioC] edgeR warnings and then error

Gordon K Smyth smyth at wehi.EDU.AU
Tue Nov 13 07:40:26 CET 2012


Dear John,

Thanks for supplying example data.  I have committed a fix an hour ago to 
edgeR 3.0.4.  Using the fixed package, the code I give below runs on my 
laptop using your data in a few seconds.

Two additional comments.  First, we warn in the edgeR User's Guide 
(Section 2.5.6) against using estimated counts from cufflinks or eXpress.

Secondly, your data seems to have some serious quality issues.  An MDS 
plot plotMDS(predFC(y)) shows that samples con3, con4 and case3 are 
massively different from the other samples.  The batch effect will not 
correct for this.  The data shows very large dispersion values.  So I 
don't think you will get good results from the code below, even though it 
runs now without errors.

Best wishes
Gordon

> library(edgeR)
> x <- read.delim("eXpress_Effective_Counts_Example.txt",row.names=1)
> group <- 
factor(c("c","c","c","c","case","case","case","case"),levels=c("c","case"))
> batch <- factor(c(1,2,3,3,1,2,3,3))
> design <- model.matrix(~batch+group)
> keep <- rowSums(cpm(x)>0.5) >= 3
> y <- DGEList(counts=x[keep,],group=group)
Calculating library sizes from column totals.
> y <- calcNormFactors(y)
> y <- estimateGLMCommonDisp(y)
> y <- estimateGLMTrendedDisp(y)
Loading required package: splines
> y <- estimateGLMTagwiseDisp(y)
>
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] edgeR_3.0.4  limma_4.15.5



On Mon, 12 Nov 2012, john herbert wrote:

> Dear Gordon,
> Thanks for your help. I tried the latest version of edgeR but
> unfortunately there is still an issue.
>
> First, I don't have any problems if I use simple count assignments
> extracted from BAM alignment files. However, I am coming across the
> problem when using the "Effective counts" values as provided by
> running the eXpress software (this uses statistics to assign
> transcript counts onto a de-novo assembled transcriptome).
>
> In the manual it recommends I use "Effective counts" from the output
> of eXpress as input to Bioconductor programs like edgeR and DESeq.
> Manual is here;
> http://bio.math.berkeley.edu/eXpress/tutorial.html#output
>
> I have tried edgeR with simple counts and this works fine but my
> problem is hanging in the
>
> y4 <- estimateGLMTrendedDisp(y4,design) step with eXpress counts (even
> with the upgrade).
>
> I attach the eXpress assigned effective counts in case you/anyone
> wants to replicate the error. Please download from;
>
> https://docs.google.com/file/d/0B9IUGsKecS4GTS01WW95WGhiWW8/edit?authkey=CNSv7PIB&authkey=CNSv7PIB
>
> In the file, the first 4 columns are control and the last four case.
>
> group <- factor(c("c","c","c","c","case","case","case","case"),
> levels=c("c","case"))
> batch <- factor(c(1,2,3,3,1,2,3,3))
> design <- model.matrix(~batch+group)
>
> Thank you for any advice to help solve this.
>
> Kind regards,
>
> John.
>
> ps. sorry if this appears twice as first time it said large files need
> moderator of approval for the mailing list.
>
>
>
> On Sat, Nov 10, 2012 at 6:59 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>> Dear John,
>>
>> I'm happy to work with you to solve any programs, but first you need to be
>> using the current Bioconductor release.  Bioconductor 2.11 was released on 3
>> October 2012, see
>>
>>   http://www.bioconductor.org/news/bioc_2_11_release/
>>
>> and the current version of edgeR is 3.0.2.  There are substantial
>> differences between the current version and the version you are using. The
>> type of divergence that is being reported in the warnings is now impossible.
>>
>> R has just changed from six monthly releases to annual releases, and
>> Bioconductor seems to have taken most users by surprise by releasing a new
>> version of Bioconductor while the version of R remains the same.
>>
>> Best wishes
>> Gordon
>>
>> --------------- original message -----------------
>> [BioC] edgeR warnings and then error
>> john herbert arraystruggles at gmail.com
>> Fri Nov 9 17:22:15 CET 2012
>>
>> Hello Bioconductors,
>>
>> For some data, the code below works great and I find differentially
>> expressed genes. However, I have added some biological replicates to
>> the data, which caused warnings and then an error.
>>
>> The warnings come from the estimateGLMTagwiseDisp function, then an
>> error appears in the glmLRT function.
>>
>> I tried the development version of R and edgeR but this now is still
>> running after several days.
>>
>> In summary,  glmLRT works fine if no warnings are produced by
>> estimateGLMTagwiseDisp as below but with warnings, it breaks down.
>>
>> There could be a work-around but not seen this yet.
>>
>> Thank you in advance of any tips.
>>
>> Kind regards,
>>
>> John..
>>
>>> library(edgeR)
>>>
>>> group <-
>>
>> factor(c("c","c","c","c","sp","sp","sp","sp","lp","lp","lp","lp","ss","ss","ss","ss","ls","ls","ls","ls","ag","ag","ag","ag"),
>> levels=c("c","sp","lp","ss","ls","ag"))
>>>
>>> batch <- factor(c(1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3))
>>> design <- model.matrix(~batch+group)
>>>
>>> counts <- read.table("Final_L1_L2_L3_mine_simple_all_reads")
>>>
>>> x4=counts
>>>
>>> y4 <- DGEList(counts=x4,group=group)
>>
>> Calculating library sizes from column totals.
>>>
>>>
>>> y4 <- estimateGLMCommonDisp(y4,design)
>>>
>>> y4 <- estimateGLMTrendedDisp(y4,design)
>>>
>>> y4 <- estimateGLMTagwiseDisp(y4,design)
>>
>> There were 24 warnings (use warnings() to see them)
>>>
>>>
>>> fit4 <- glmFit(y4, design)
>>>
>>> lrt4 <- glmLRT(fit4, coef=3)
>>
>> Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
>>   dims [product 0] do not match the length of object [13]
>>>
>>> warnings()
>>
>> Warning messages:
>> 1: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 2: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 3: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 4: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 5: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>> exceeded
>> 6: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>> exceeded
>> 7: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 8: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 9: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>> exceeded
>> 10: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 11: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 12: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 13: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 14: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>> iterations exceeded
>> 15: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 16: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 17: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 18: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>> iterations exceeded
>> 19: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 20: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 21: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 22: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>> iterations exceeded
>> 23: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>> 24: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>> iterations exceeded
>>
>>> sessionInfo()
>>
>> R version 2.15.2 (2012-10-26)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>> Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_United Kingdom.1252
>>
>> attached base packages:
>> [1] splines   stats     graphics  grDevices utils     datasets
>> methods   base
>>
>> other attached packages:
>>  [1] pasilla_0.2.13     DEXSeq_1.2.1       DESeq_1.8.3
>> locfit_1.5-8       edgeR_2.6.12       plyr_1.7.1
>> genefilter_1.38.0  Biobase_2.16.0
>>  [9] BiocGenerics_0.2.0 limma_3.12.3
>>
>> loaded via a namespace (and not attached):
>>  [1] annotate_1.34.0      AnnotationDbi_1.18.0 biomaRt_2.12.0
>> DBI_0.2-5            geneplotter_1.34.0   grid_2.15.2
>> hwriter_1.3
>>  [8] IRanges_1.14.2       lattice_0.20-10      RColorBrewer_1.0-5
>> RCurl_1.91-1.1       RSQLite_0.11.1       statmod_1.4.16
>> stats4_2.15.2
>> [15] stringr_0.6          survival_2.36-14     tools_2.15.2
>> XML_3.9-4.1          xtable_1.7-0

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list