[BioC] EdgeR - problems running estimateCRDisp

Gordon K Smyth smyth at wehi.EDU.AU
Sun Oct 24 01:31:57 CEST 2010


Dear Josquin,

estimateCDDisp() has worked on all the datasets we have tested it out on 
so far (although that's admitedly a very limited number), so we are not 
able to reproduce your problems.  The function is liable to convergence 
issues however because it needs to fit a very large number of generalized 
linear models, one or more of which could conceivably fail to converge. 
That might be the problem with your data.  I don't see any problems with 
the code example that you give, you seem to be using the functions 
correctly, so the issue must be with the count data itself.  Would you be 
willing to share your data with us offline so that we diagnose and perhaps 
solve the problem?

To see estimateCDDisp() working as intended, you can run the example:

   library(edgeR)
   example(glmFit)
   CR <- estimateCRDisp(d, design)
   fit <- glmFit(d, design, dispersion=CR$CR.common.dispersion)
   results <- glmLRT(d, fit, coef=2)
   topTags(results)

Best wishes
Gordon


----- ORIGINAL MESSAGE -------------
[BioC] EdgeR - problems running estimateCRDisp
josquin.tibbits at dpi.vic.gov.au josquin.tibbits at dpi.vic.gov.au
Fri Oct 22 07:44:00 CEST 2010

I am running the new EdgeR package 1.8 and have updated R to 1.12.0 and
updated all the packages I have installed.

I have an RNAseq experiment with 24 samples and am wanting to run a glm
analysis using the Cox-Reid common dispersion (and tagwise) paramaters.  I
have created a DGEList object using a targets file and this successfully
ran in the calcNormFactors() and in the plotMDS.dge() functions. I have
also created what appears to be a sensible and ok design matrix using
model.matrix().  These then are being input to estimateCRDisp() to
estimate the common dispersion.  The code I have run is as follows (in
red) with output (blue) which is pretty much exactly the same as the
worked example in the manual on pp 51-- and does not seem abnormal at
all......

targets <- read.delim("Lp_NUE_DGEbymSEQ_TARGETS_EdgeRIN.txt",
stringsAsFactors = FALSE, row.names = "label")
targets$Nitrogen        <- factor(targets$Nitrogen)
targets$Endophyte       <- factor(targets$Endophyte)
targets$Tissue <- factor(targets$Tissue)
targets

## CREATE THE DGEList Object

DGEList.object <- readDGE(targets, skip = 1, comment.char = '#')
colnames(DGEList.object) <- row.names(targets)



## CALCULATE THE NORMALISATION (f) FACTORS

DGEList.object <- calcNormFactors(DGEList.object)

## Exclude data where rowsums are < 30 YOU CAN CHANGE THIS (there being 24
samples in the experiment)

DGEList.object <- DGEList.object[rowSums(DGEList.object$counts) > 30, ]
DGEList.object

An object of class "DGEList"
$samples
                                        files Endophyte Tissue Nitrogen
lib.size norm.factors
Lp_NUE_LFFULL  Lp_NUE_LFFULL_ER_RAWREADS.txt      Free   Leaf     Full
3786614    0.9244799
Lp_NUE_LFNO3    Lp_NUE_LFNO3_ER_RAWREADS.txt      Free   Leaf  Partial
5311856    0.8120410
Lp_NUE_LFK        Lp_NUE_LFK_ER_RAWREADS.txt      Free   Leaf     Full
3482772    0.7224960
Lp_NUE_LFPO4    Lp_NUE_LFPO4_ER_RAWREADS.txt      Free   Leaf     Full
6208397    0.7367309
Lp_NUE_LFCa      Lp_NUE_LFCa_ER_RAWREADS.txt      Free   Leaf     Full
3597004    0.8147619
19 more rows ...

$counts
              Lp_NUE_LFFULL  Lp_NUE_LFNO3  Lp_NUE_LFK  Lp_NUE_LFPO4
Lp_NUE_LFCa Lp_NUE_LFNH4  Lp_NUE_RFFULL   Lp_NUE_RFNO3  Lp_NUE_RFK
Lp_NUE_RFPO4
>prg_cds_44             345           382         120           261  141
      215             888           717         698         1112
>prg_cds_141            290           451         126           422  235
      459             676           559         474          879
>prg_cds_2              233           277         131           333  239
      376             246           185         124          282
>prg_cds_16             433           608         184           614  346
      423             924           756         595          913
>prg_cds_84             287           353          90           241  248
      394             823           612         989         1061
              Lp_NUE_RFCa  Lp_NUE_RFNH4  Lp_NUE_LSFULL  Lp_NUE_LSNO3
Lp_NUE_LSK Lp_NUE_LSPO4 Lp_NUE_LSCa  Lp_NUE_LSNH4  Lp_NUE_RSFULL
Lp_NUE_RSNO3
>prg_cds_44          1107           439            954           257  205
        107          868           106             316         1147
>prg_cds_141          583          1052            773           325  230
        193          600           264             414         1046
>prg_cds_2            235           454            177           232  177
        203          138           239             334          228
>prg_cds_16           817          1040           2373           464  220
        276          895           291             714         2655
>prg_cds_84           890          1653            894           268  169
        123          714           209             311         1078
              Lp_NUE_RSK  Lp_NUE_RSPO4  Lp_NUE_RSCa  Lp_NUE_RSNH4
>prg_cds_44          622           903          114          474
>prg_cds_141         505           711          205         1166
>prg_cds_2            93           136          157          340
>prg_cds_16         1398          1660          311         1258
>prg_cds_84          893           771          154         1269
63905 more rows ...


## Produce an MDS plot of the data

#plotMDS.dge(DGEList.object, main = "MDS plot for Lp_NUE_DGEbymSEQ")


## DEFINE THE DESIGN MATRIX

design <- model.matrix(~ Endophyte + Tissue + Nitrogen , data = targets)
print("This is the design matrix")
design

                 (Intercept) EndophyteSTWT TissueRoot NitrogenPartial
Lp_NUE_LFFULL             1             0          0               0
Lp_NUE_LFNO3              1             0          0               1
Lp_NUE_LFK                1             0          0               0
Lp_NUE_LFPO4              1             0          0               0
Lp_NUE_LFCa               1             0          0               0
Lp_NUE_LFNH4              1             0          0               1
Lp_NUE_RFFULL             1             0          1               0
Lp_NUE_RFNO3              1             0          1               1
Lp_NUE_RFK                1             0          1               0
Lp_NUE_RFPO4              1             0          1               0
Lp_NUE_RFCa               1             0          1               0
Lp_NUE_RFNH4              1             0          1               1
Lp_NUE_LSFULL             1             1          0               0
Lp_NUE_LSNO3              1             1          0               1
Lp_NUE_LSK                1             1          0               0
Lp_NUE_LSPO4              1             1          0               0
Lp_NUE_LSCa               1             1          0               0
Lp_NUE_LSNH4              1             1          0               1
Lp_NUE_RSFULL             1             1          1               0
Lp_NUE_RSNO3              1             1          1               1
Lp_NUE_RSK                1             1          1               0
Lp_NUE_RSPO4              1             1          1               0
Lp_NUE_RSCa               1             1          1               0
Lp_NUE_RSNH4              1             1          1               1
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Endophyte
[1] "contr.treatment"

attr(,"contrasts")$Tissue
[1] "contr.treatment"

attr(,"contrasts")$Nitrogen
[1] "contr.treatment"


## ESTIMATE THE Cox-Reid common dispersion

DGEList.object <- estimateCRDisp(DGEList.object, design)
names(DGEList.object)


This step is giving  the following error:::


Loading required package: MASS
Error: NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: glm.fit: algorithm did not converge
2: step size truncated due to divergence
> names(DGEList.object)
[1] "samples" "counts"

Clearly the function is not working. I have tried two different computers
and a run it in 64 bit but always with the same result. I have also tried
using a balanced design and a simple design all with the same results. I
have used the DGEList object to analyse the data using the original
estimateCommonDisp() and exactTest() which worked fine.


  I was hoping someone might be able to help resolve this for me????


Thanks in advance....................... Josquin

The sessionInfo() and traceback() are below:


> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] MASS_7.3-8  edgeR_1.8.1

loaded via a namespace (and not attached):
[1] limma_3.6.0


> traceback()
4: .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars,
        y = w * z, ny = 1L, tol = min(1e-07, control$epsilon/1000),
        coefficients = double(nvars), residuals = double(ngoodobs),
        effects = double(ngoodobs), rank = integer(1L), pivot = 1L:nvars,
        qraux = double(nvars), work = double(2 * nvars), PACKAGE = "base")
3: glm.fit(design, y[i, ], offset = offset[i, ], family = f)
2: adjustedProfileLik(spline.disp[i], y.select, design = design,
        offset = offset.mat.select + lib.size.mat.select)
1: estimateCRDisp(DGEList.object, design)
Notice:
This email and any attachments may contain information t...{{dropped:9}}



More information about the Bioconductor mailing list