[BioC] DEXSeq on two-exon genes: how to specify a formula without redundant terms

Narayanan, Manikandan (NIH/NIAID) [E] manikandan.narayanan at nih.gov
Thu May 16 17:14:26 CEST 2013


Hi DEXSeq users/developers,
  I have used DEXSeq successfuly for genes with many exons and really like the diagnostic/visualization plots that come with it. Recently though, for genes with two testable exons, I am getting the "Underdetermined model; cannot estimate dispersions." error.

  I figure this is due to redundant terms in my formula as shown in PS below. So my questions are:

1) Is there a way to specify the formula  count ~ sample + (condition + batch) * exon so that redundant terms 'condition + batch' are removed?

2) If not, is it safe to change ncol(mm) to qr(mm)$rank (i.e., rank of model matrix to remove redundant terms) in this piece of code in estimateExonDispersionsForModelFrame:
    if (nrow(mm) <= ncol(mm))
        stop("Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.")

Would changing the code this way violate any assumptions of the DEXSeq model?


Thank you,
Mani


PS: # condition + batch terms are redundant as sample term is already present!
> formulaDispersion
count ~ sample + (condition + batch) * exon

> design(ecs)
                 condition       batch
Untr_biorep1      Untr     biorep1
LPS_biorep1        LPS     biorep1
Untr_biorep2      Untr     biorep2
LPS_biorep2        LPS     biorep2

> colnames(model.matrix(formulaDispersion, mf))
[1] "(Intercept)"            "sampleLPS_biorep2"      "sampleUntr_biorep1"
[4] "sampleUntr_biorep2"     "conditionUntr"          "batchbiorep2"
[7] "exonE002"               "conditionUntr:exonE002" "batchbiorep2:exonE002"



More information about the Bioconductor mailing list