[BioC] DEXSeq on two-exon genes: how to specify a formula without redundant terms

Alejandro Reyes alejandro.reyes at embl.de
Fri May 24 14:19:07 CEST 2013


Dear Manikandan Narayanan,

Sorry for the delayed reply!

Thanks for your e-mail! It has led us to spot two minor bugs and 
therefore modified the DEXSeq code in two aspects:

Firstly, redundant terms were removed in DEXSeq in a very naive way:  a 
lm.fit model was first fitted and the columns from the model matrix with 
NA coefficients were removed from the model matrix, now a proper 
function has been written for this purpose (with basically the code to 
what you also proposed). Secondly, our "if" statement that you mentioned 
was wrongly located within the function, which was causing your error 
message. It has been relocated and now it seems to work for its purpose 
( I tested with your model frame and formula).

You will find the changes in the last version of the svn (1.7.2). Let me 
know if it works for you or if you have additional questions.

Best regards,
Alejandro Reyes


> Hi,
>    Would someone be kind enough to clarify the two questions raised about DEXSeq in my earlier email below?
>
>    Would attaching any further information help? I am working with DEXSeq 1.6.0, and here are the model frame and model matrices if it makes it easier. Thanks!
>
>> mf
>          sample exon sizeFactor condition   batch dispersion count
> 1 Untr_biorep1 E001  1.0517803      Untr biorep1 0.13523549    25
> 2 Untr_biorep1 E002  1.0517803      Untr biorep1 0.01656906   708
> 3  LPS_biorep1 E001  1.0010474       LPS biorep1 0.13523549    20
> 4  LPS_biorep1 E002  1.0010474       LPS biorep1 0.01656906   442
> 5 Untr_biorep2 E001  0.8920483      Untr biorep2 0.13523549    26
> 6 Untr_biorep2 E002  0.8920483      Untr biorep2 0.01656906   947
> 7  LPS_biorep2 E001  1.1090401       LPS biorep2 0.13523549    25
> 8  LPS_biorep2 E002  1.1090401       LPS biorep2 0.01656906   884
>
>> mm
>    (Intercept) sampleLPS_biorep2 sampleUntr_biorep1 sampleUntr_biorep2
> 1           1                 0                  1                  0
> 2           1                 0                  1                  0
> 3           1                 0                  0                  0
> 4           1                 0                  0                  0
> 5           1                 0                  0                  1
> 6           1                 0                  0                  1
> 7           1                 1                  0                  0
> 8           1                 1                  0                  0
>    conditionUntr batchbiorep2 exonE002 conditionUntr:exonE002
> 1             1            0        0                      0
> 2             1            0        1                      1
> 3             0            0        0                      0
> 4             0            0        1                      0
> 5             1            1        0                      0
> 6             1            1        1                      1
> 7             0            1        0                      0
> 8             0            1        1                      0
>    batchbiorep2:exonE002
> 1                     0
> 2                     0
> 3                     0
> 4                     0
> 5                     0
> 6                     1
> 7                     0
> 8                     1
> attr(,"assign")
> [1] 0 1 1 1 2 3 4 5 6
> attr(,"contrasts")
> attr(,"contrasts")$sample
> [1] "contr.treatment"
>
> attr(,"contrasts")$condition
> [1] "contr.treatment"
>
> attr(,"contrasts")$batch
> [1] "contr.treatment"
>
> attr(,"contrasts")$exon
> [1] "contr.treatment"
>
>
>
>
>
>
>
> From: Narayanan, Manikandan (NIH/NIAID) [E]
> Sent: Thursday, May 16, 2013 11:14 AM
> To: bioconductor at r-project.org
> Subject: DEXSeq on two-exon genes: how to specify a formula without redundant terms
>
> Hi DEXSeq users/developers,
>    I have used DEXSeq successfuly for genes with many exons and really like the diagnostic/visualization plots that come with it. Recently though, for genes with two testable exons, I am getting the "Underdetermined model; cannot estimate dispersions." error.
>
>    I figure this is due to redundant terms in my formula as shown in PS below. So my questions are:
>
> 1) Is there a way to specify the formula  count ~ sample + (condition + batch) * exon so that redundant terms 'condition + batch' are removed?
>
> 2) If not, is it safe to change ncol(mm) to qr(mm)$rank (i.e., rank of model matrix to remove redundant terms) in this piece of code in estimateExonDispersionsForModelFrame:
>      if (nrow(mm) <= ncol(mm))
>          stop("Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.")
>
> Would changing the code this way violate any assumptions of the DEXSeq model?
>
>
> Thank you,
> Mani
>
>
> PS: # condition + batch terms are redundant as sample term is already present!
>> formulaDispersion
> count ~ sample + (condition + batch) * exon
>
>> design(ecs)
>                   condition       batch
> Untr_biorep1      Untr     biorep1
> LPS_biorep1        LPS     biorep1
> Untr_biorep2      Untr     biorep2
> LPS_biorep2        LPS     biorep2
>
>> colnames(model.matrix(formulaDispersion, mf))
> [1] "(Intercept)"            "sampleLPS_biorep2"      "sampleUntr_biorep1"
> [4] "sampleUntr_biorep2"     "conditionUntr"          "batchbiorep2"
> [7] "exonE002"               "conditionUntr:exonE002" "batchbiorep2:exonE002"
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list