[BioC] edgeR: mixing technical replicates from Illumina HiSeq and MiSeq
rct at thompsonclan.org
Fri Aug 29 19:04:52 CEST 2014
Personally, I see only two possibilities. Either you have true
technical replicates with Poisson variance (zero dispersion on
technical replicates, as I described earlier) or you don't. In the
former case, you merge the technical replicates. In the latter case,
you then need to figure out whether the differences between replicates
are a consistent effect dependent on sequencing platform, or just
random (using PCA plots, etc.) For a consistent effect, you can adjust
for it by including a blocking factor in the model. Otherwise I would
probably just do a separate analysis for each sequencing platform. As
for the threshold between "zero" and "non-zero" dispersion, I'm not
really sure what a reasonable threshold is. You have to try it and see.
On Fri Aug 29 02:29:15 2014, Nick N wrote:
> Thanks Ryan and Nicolas!
> I was wondering whether there is some sort of decision tree that can
> be formalised.
> Nicolas, you would consider 3 options - merging, ignoring or adding a
> factor. Could you recommend some sort of cut-offs for each choice or
> is it more of a qualitative decision by looking at plots and PCA? By
> the way, my data is RNA-Seq - I forgot to mention it.
> Ryan, I would basically ask you the same question.
> On Fri, Aug 29, 2014 at 9:42 AM, Ryan <rct at thompsonclan.org
> <mailto:rct at thompsonclan.org>> wrote:
> Hi Nick,
> Thanks to the underlying theory behind dispersion estimation, you
> can easily test whether your "technical replicates" really do
> represent technical replicates. Specifically, read counts in
> technical replicates should follow a Poisson distribution, which
> is a special case of the negative binomial with zero dispersion.
> So, simply fit a model using edgeR or DESeq2 with a separate
> coefficient for each group of technical replicates. Thus all the
> experimental variation will be absorbed into the model
> coefficients and the only thing left will be the technical
> variability of of the replicates. For true technical replicates,
> the dispersion should be zero for all genes. So if you estimate
> dispersions using this model, and plotBCV/plotDispEsts shows the
> dispersion very near to zero, then you can be confident that you
> really have technical replicates. If the dispersion is nonzero,
> then there is some additional source of unaccounted-for variation.
> I have used this method on a pilot dataset with several technical
> replicates for each condition. edgeR said the dispersion was
> something like 10^-3 or less for all genes except for the very
> low-expressed genes.
> On 8/28/14, 9:23 AM, Nick N wrote:
> I have a study where a fraction of the samples have been
> replicated on 2
> Illumina platforms (HiSeq and Miseq). These are technical
> replicates - the
> library preparation is the same using the same biological
> replicates - it's
> only the sequencing which is different.
> My hunch was that I shall introduce the platform as as an
> (blocking) factor in the analysis. Than I stumbled upon this post:
> It recommends pooling the replicates. The post seems to apply to a
> different case ("pure" technical replicates, i.e. no
> differences in the
> sequencing platform used) so I probably shall ignore it. But I
> still feel a
> bit uncertain of the best way to treat the technical
> replicates. Can you,
> please, advise me on this?
> many thanks!
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> Search the archives:
More information about the Bioconductor