[BioC] edgeR: mixing technical replicates from Illumina HiSeq and MiSeq

Gordon K Smyth smyth at wehi.EDU.AU
Sun Aug 31 01:44:28 CEST 2014


Dear Nick,

If you go back to the post from 2010 that you give the URL for, you will 
see that I was giving very briefly the same advice about checking Poisson 
variability that Ryan has explained at greater detail.

You don't give any information about read lengths, sequence depths or 
alignment methods.  I would be surprised if MiSeq and HiSeq would generate 
perfect Poisson replicates of one another, especially if the read lengths 
from the two platform are different or the alignment and counting software 
has been varied.  So you may well end up back at the blocking idea.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Sun, 31 Aug 2014, Ryan wrote:

> Thanks to the underlying theory behind dispersion estimation, you can 
> easily test whether your "technical replicates" really do represent 
> technical replicates. Specifically, read counts in technical replicates 
> should follow a Poisson distribution, which is a special case of the 
> negative binomial with zero dispersion. So, simply fit a model using edgeR 
> or DESeq2 with a separate coefficient for each group of technical 
> replicates. Thus all the experimental variation will be absorbed into the 
> model coefficients and the only thing left will be the technical 
> variability of of the replicates. For true technical replicates, the 
> dispersion should be zero for all genes. So if you estimate dispersions 
> using this model, and plotBCV/plotDispEsts shows the dispersion very near 
> to zero, then you can be confident that you really have technical 
> replicates. If the dispersion is nonzero, then there is some additional 
> source of unaccounted-for variation.
> 
> I have used this method on a pilot dataset with several technical 
> replicates for each condition. edgeR said the dispersion was something like 
> 10^-3 or less for all genes except for the very low-expressed genes.
> 
> -Ryan
> 
> On 8/28/14, 9:23 AM, Nick N wrote:
>> Hi,
>> 
>> I have a study where a fraction of the samples have been replicated on 2 
>> Illumina platforms (HiSeq and Miseq). These are technical replicates - the 
>> library preparation is the same using the same biological replicates - 
>> it's only the sequencing which is different.
>> 
>> My hunch was that I shall introduce the platform as as an additional 
>> (blocking) factor in the analysis. Than I stumbled upon this post:
>> 
>> https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html
>> 
>> It recommends pooling the replicates. The post seems to apply to a 
>> different case ("pure" technical replicates, i.e. no differences in the 
>> sequencing platform used) so I probably shall ignore it. But I still feel 
>> a bit uncertain of the best way to treat the technical replicates. Can 
>> you, please, advise me on this?
>> 
>> many thanks!
>> Nick

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list