[BioC] edgeR: mixing technical replicates from Illumina HiSeq and MiSeq

Wed Sep 3 01:12:00 CEST 2014

Yes, that's all that is needed.

In general, one corrects for a batch effect by fitting

   ~ batch + otherterms

where "otherterms" is what the model would have been without the batch 
effect.

Best wishes
Gordon

On Mon, 1 Sep 2014, Nick N wrote:

> Dea Gordon, Ryan and Nicolas,
>
> Than you all for the detailed advice.
>
> I have one more question regarding the blocking factor model. In my case I
> have, actually, 2 external factors to consider - one is the platform, the
> other one are the subjects.
>
> My sample matrix is the following (I've attached the CSV in case you can't
> view the image):
>
>
>
>
>
> I am only interested in comparing treatments B:D to A (the latter are
> controls). So far I've never had a model with more than one external
> factor. I imagine it should be OK to have more - is this correct? If yes -
> can you, perhaps, check whether I am setting the model matrix correctly?
> (Apologies if this sounds too trivial) I imagine it shall be defined as:
>
> Platform <- factor(targets$Platform)
>> Subject  <- factor(targets$Subject)
>> Treatment <- factor(targets$Treatment)
>> design <- model.matrix(~Platform+Subject+Treatment)
>
> ..
>> fit <- glmFit(y, design)
>> lrt <- glmLRT(fit, coef=24) # for comparing Treatment B to Treatment A
>
>
> Is this correct?
>
>
>
> On Sun, Aug 31, 2014 at 12:44 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Nick,
>>
>> If you go back to the post from 2010 that you give the URL for, you will
>> see that I was giving very briefly the same advice about checking Poisson
>> variability that Ryan has explained at greater detail.
>>
>> You don't give any information about read lengths, sequence depths or
>> alignment methods.  I would be surprised if MiSeq and HiSeq would generate
>> perfect Poisson replicates of one another, especially if the read lengths
>> from the two platform are different or the alignment and counting software
>> has been varied.  So you may well end up back at the blocking idea.
>>
>>
>> Best wishes
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>>
>> On Sun, 31 Aug 2014, Ryan wrote:
>>
>>  Thanks to the underlying theory behind dispersion estimation, you can
>>> easily test whether your "technical replicates" really do represent
>>> technical replicates. Specifically, read counts in technical replicates
>>> should follow a Poisson distribution, which is a special case of the
>>> negative binomial with zero dispersion. So, simply fit a model using edgeR
>>> or DESeq2 with a separate coefficient for each group of technical
>>> replicates. Thus all the experimental variation will be absorbed into the
>>> model coefficients and the only thing left will be the technical
>>> variability of of the replicates. For true technical replicates, the
>>> dispersion should be zero for all genes. So if you estimate dispersions
>>> using this model, and plotBCV/plotDispEsts shows the dispersion very near
>>> to zero, then you can be confident that you really have technical
>>> replicates. If the dispersion is nonzero, then there is some additional
>>> source of unaccounted-for variation.
>>>
>>> I have used this method on a pilot dataset with several technical
>>> replicates for each condition. edgeR said the dispersion was something like
>>> 10^-3 or less for all genes except for the very low-expressed genes.
>>>
>>> -Ryan
>>>
>>> On 8/28/14, 9:23 AM, Nick N wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a study where a fraction of the samples have been replicated on 2
>>>> Illumina platforms (HiSeq and Miseq). These are technical replicates - the
>>>> library preparation is the same using the same biological replicates - it's
>>>> only the sequencing which is different.
>>>>
>>>> My hunch was that I shall introduce the platform as as an additional
>>>> (blocking) factor in the analysis. Than I stumbled upon this post:
>>>>
>>>> https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html
>>>>
>>>> It recommends pooling the replicates. The post seems to apply to a
>>>> different case ("pure" technical replicates, i.e. no differences in the
>>>> sequencing platform used) so I probably shall ignore it. But I still feel a
>>>> bit uncertain of the best way to treat the technical replicates. Can you,
>>>> please, advise me on this?
>>>>
>>>> many thanks!
>>>> Nick

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}