[BioC] edgeR: mixing technical replicates from Illumina HiSeq and MiSeq

Fri Aug 29 09:53:55 CEST 2014

Hej Nick!

Even if technical replicates on Illumina sequencers tend to be very similar, I would always do a number of checks before actually merging them. I usually do as follows to learn how similar/different are my technical replicates (honestly I haven’t had many in the recent past, most have been biological reps, but the same apply):

1) do scatterplots of the raw data replicates (pair-wise) to see how similar are the replicates (e.g. by binning reads into 1-10 kb windows)
2) do a PCA (sample based, so on a transpose of the above matrix; i.e. prcomp(t(binnedDataMatrix))). In that PCA, I’ll check whether the replicates cluster together and whether there’s is any dimension separating the tech replicates, and what’s is the contribution of the corresponding component.
3) plot the density distribution of all the samples and boxplots of the same to see how similar they are between replicates.

4) Normalise the data using a vst approach (to normalise for lib size and to correct for the var~mean relationship). I’m using a vst approach here no matter the sample size of the experiment because it gives in my opinion “clearer” plots; or plots I can better interpret, but that is independent of how I would conduct the analysis (i.e. I would only use a vst approach if I have enough replication per condition, see Soneson and Delorenzi, 2013, BMC Bioinformatics for more).
5) redo all the plots above to complement the analyses

After that, I usually get a good idea of what the properties of my tech. rep.s are and if I should consider 1) ignoring them, 2) merging them or 3) adding an additional factor in the analyses. I also recall that edgeR, DESeq, DESeq2 expect replicates to be biological replicates and not technical replicates since technical replicates on illumina usually show very little variation - hence the suggestion to merge them - and this could possibly bias the dispersion estimation.

You did not precise what data you have at hand (DNA, RNA-Seq?) so I described a more global approach (binning) but for my RNA-Seq study,  I actually do the comparison also after I’ve generated my count-table(s). 

HTH,

Nico

---------------------------------------------------------------
Nicolas Delhomme

The Street Lab
Department of Plant Physiology
Umeå Plant Science Center

Tel: +46 90 786 5478
Email: nicolas.delhomme at umu.se
SLU - Umeå universitet
Umeå S-901 87 Sweden
---------------------------------------------------------------

On 28 Aug 2014, at 18:23, Nick N <feralmedic at gmail.com> wrote:

> Hi,
> 
> I have a study where a fraction of the samples have been replicated on 2
> Illumina platforms (HiSeq and Miseq). These are technical replicates - the
> library preparation is the same using the same biological replicates - it's
> only the sequencing which is different.
> 
> My hunch was that I shall introduce the platform as as an additional
> (blocking) factor in the analysis. Than I stumbled upon this post:
> 
> https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html
> 
> It recommends pooling the replicates. The post seems to apply to a
> different case ("pure" technical replicates, i.e. no differences in the
> sequencing platform used) so I probably shall ignore it. But I still feel a
> bit uncertain of the best way to treat the technical replicates. Can you,
> please, advise me on this?
> 
> many thanks!
> Nick
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor