[BioC] edgeR and DESeq2: model design and estimation of dispersion

Sun Jun 15 19:24:46 CEST 2014

hi Iddo,

I wouldn't recommend using a different design for dispersion
estimation and then for the GLM. One way to think about it is that,
differences in counts which can be accounted for by the individual
effect in the GLM will be observed as higher dispersion in the
dispersion estimation, so in general you would end up overly
conservative by taking that approach to dispersion estimation. As you
have many samples and a large design matrix, you could try using
linear models, as in voom/limma, which will be faster to fit.

Mike

On Thu, Jun 12, 2014 at 9:51 AM, Iddo Ben-dov <iddobe at ekmd.huji.ac.il> wrote:
> hi,
>
> in both edgeR and DESeq2, estimation of dispersion precedes negative binomial GLM fitting.
>
> my question is, can I use a design formula when estimating dispersion which is different from the formula used for GLM fitting? specifically, I would like to use a simplified design when estimating dispersion and a full design for GLM fitting.
>
> my motivation for doing so is that with the full design estimation of dispersion is too demanding for my computer and time.
>
> my dataset includes 400 mRNAseq profiles (~22,000 genes). there are 100 controls and 100 cases, and each was sampled twice - before and after intervention.
>
> thus, the full design is:
>  ~ group*intervention + individual:group (blocking factor)
>
> as I mentioned, estimation of dispersion with the above design is not practical, and I thus would like to simplify to:
> ~ group*intervention
>
> and introduce the 'individual' blocking factor only for NB GLM fitting.
>
> is this statistically valid?
>
> appreciate any help,
> iddo
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor