[BioC] edgeR: mixing technical replicates from Illumina HiSeq and MiSeq

Ryan rct at thompsonclan.org
Fri Aug 29 19:04:52 CEST 2014


Personally, I see only two possibilities. Either you have true 
technical replicates with Poisson variance (zero dispersion on 
technical replicates, as I described earlier) or you don't. In the 
former case, you merge the technical replicates. In the latter case, 
you then need to figure out whether the differences between replicates 
are a consistent effect dependent on sequencing platform, or just 
random (using PCA plots, etc.) For a consistent effect, you can adjust 
for it by including a blocking factor in the model. Otherwise I would 
probably just do a separate analysis for each sequencing platform. As 
for the threshold between "zero" and "non-zero" dispersion, I'm not 
really sure what a reasonable threshold is. You have to try it and see.

On Fri Aug 29 02:29:15 2014, Nick N wrote:
> Thanks Ryan and Nicolas!
>
> I was wondering whether there is some sort of decision tree that can
> be formalised.
>
> Nicolas, you would consider 3 options - merging, ignoring or adding a
> factor. Could you recommend some sort of cut-offs for each choice or
> is it more of a qualitative decision by looking at plots and PCA? By
> the way, my data is RNA-Seq - I forgot to mention it.
>
> Ryan, I would basically ask you the same question.
>
>
> On Fri, Aug 29, 2014 at 9:42 AM, Ryan <rct at thompsonclan.org
> <mailto:rct at thompsonclan.org>> wrote:
>
>     Hi Nick,
>
>     Thanks to the underlying theory behind dispersion estimation, you
>     can easily test whether your "technical replicates" really do
>     represent technical replicates. Specifically, read counts in
>     technical replicates should follow a Poisson distribution, which
>     is a special case of the negative binomial with zero dispersion.
>     So, simply fit a model using edgeR or DESeq2 with a separate
>     coefficient for each group of technical replicates. Thus all the
>     experimental variation will be absorbed into the model
>     coefficients and the only thing left will be the technical
>     variability of of the replicates. For true technical replicates,
>     the dispersion should be zero for all genes. So if you estimate
>     dispersions using this model, and plotBCV/plotDispEsts shows the
>     dispersion very near to zero, then you can be confident that you
>     really have technical replicates. If the dispersion is nonzero,
>     then there is some additional source of unaccounted-for variation.
>
>     I have used this method on a pilot dataset with several technical
>     replicates for each condition. edgeR said the dispersion was
>     something like 10^-3 or less for all genes except for the very
>     low-expressed genes.
>
>     -Ryan
>
>
>     On 8/28/14, 9:23 AM, Nick N wrote:
>
>         Hi,
>
>         I have a study where a fraction of the samples have been
>         replicated on 2
>         Illumina platforms (HiSeq and Miseq). These are technical
>         replicates - the
>         library preparation is the same using the same biological
>         replicates - it's
>         only the sequencing which is different.
>
>         My hunch was that I shall introduce the platform as as an
>         additional
>         (blocking) factor in the analysis. Than I stumbled upon this post:
>
>         https://stat.ethz.ch/__pipermail/bioconductor/2010-__April/033099.html
>         <https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html>
>
>         It recommends pooling the replicates. The post seems to apply to a
>         different case ("pure" technical replicates, i.e. no
>         differences in the
>         sequencing platform used) so I probably shall ignore it. But I
>         still feel a
>         bit uncertain of the best way to treat the technical
>         replicates. Can you,
>         please, advise me on this?
>
>         many thanks!
>         Nick
>
>                 [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>



More information about the Bioconductor mailing list