[BioC] edgeR: mixing technical replicates from Illumina HiSeq and MiSeq

Fri Aug 29 11:29:15 CEST 2014

Thanks Ryan and Nicolas!

I was wondering whether there is some sort of decision tree that can be
formalised.

Nicolas, you would consider 3 options - merging, ignoring or adding a
factor. Could you recommend some sort of cut-offs for each choice or is it
more of a qualitative decision by looking at plots and PCA? By the way, my
data is RNA-Seq - I forgot to mention it.

Ryan, I would basically ask you the same question.

On Fri, Aug 29, 2014 at 9:42 AM, Ryan <rct at thompsonclan.org> wrote:

> Hi Nick,
>
> Thanks to the underlying theory behind dispersion estimation, you can
> easily test whether your "technical replicates" really do represent
> technical replicates. Specifically, read counts in technical replicates
> should follow a Poisson distribution, which is a special case of the
> negative binomial with zero dispersion. So, simply fit a model using edgeR
> or DESeq2 with a separate coefficient for each group of technical
> replicates. Thus all the experimental variation will be absorbed into the
> model coefficients and the only thing left will be the technical
> variability of of the replicates. For true technical replicates, the
> dispersion should be zero for all genes. So if you estimate dispersions
> using this model, and plotBCV/plotDispEsts shows the dispersion very near
> to zero, then you can be confident that you really have technical
> replicates. If the dispersion is nonzero, then there is some additional
> source of unaccounted-for variation.
>
> I have used this method on a pilot dataset with several technical
> replicates for each condition. edgeR said the dispersion was something like
> 10^-3 or less for all genes except for the very low-expressed genes.
>
> -Ryan
>
>
> On 8/28/14, 9:23 AM, Nick N wrote:
>
>> Hi,
>>
>> I have a study where a fraction of the samples have been replicated on 2
>> Illumina platforms (HiSeq and Miseq). These are technical replicates - the
>> library preparation is the same using the same biological replicates -
>> it's
>> only the sequencing which is different.
>>
>> My hunch was that I shall introduce the platform as as an additional
>> (blocking) factor in the analysis. Than I stumbled upon this post:
>>
>> https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html
>>
>> It recommends pooling the replicates. The post seems to apply to a
>> different case ("pure" technical replicates, i.e. no differences in the
>> sequencing platform used) so I probably shall ignore it. But I still feel
>> a
>> bit uncertain of the best way to treat the technical replicates. Can you,
>> please, advise me on this?
>>
>> many thanks!
>> Nick
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.
>> science.biology.informatics.conductor
>>
>
>

	[[alternative HTML version deleted]]