[BioC] edgeR: mixing technical replicates from Illumina HiSeq and MiSeq

Fri Sep 5 23:31:14 CEST 2014

Thanks for all the advice so far. I tried that route but I am getting

> y <- estimateGLMCommonDisp(d, design)
>
> Error in glmFit.default(y, design = design, dispersion = dispersion,
>> offset = offset,  :
>
>   Design matrix not of full rank.  The following coefficients not
>> estimable:
>
>  TreatmentB TreatmentC TreatmentD TreatmentE TreatmentF
>
>
For the sake of completeness here is my code:

targets <- readTargets()
>
> d <- readDGE(targets$FileName)
>
> keep <- rowSums(cpm(d)>1) >=3
>
> d <- d[keep,]
>
> d$samples$lib.size <- colSums(d$counts)
>
> d <- calcNormFactors(d)
>
> all_cpm=cpm(d, normalized.lib.size=TRUE)
>
> all_counts <- cbind(rownames(all_cpm), all_cpm)
>
> colnames(all_counts)[1] <- "Ensembl.Gene.ID"
>
> rownames(all_counts) <- NULL
>
> Subject <- factor(targets$Subject)
>
> Platform <- factor(targets$Platform)
>
> Treatment <- factor(targets$Treatment)
>
> design <- model.matrix(~Platform+Subject+Treatment)
>
> y <- estimateGLMCommonDisp(d, design)
>
>
The design matrix looks OK to me:

> design
>
>    (Intercept) PlatformMiSeq Subjecta8 Subjectb6 Subjectb7 Subjectb8
>> Subjectc5 Subjectc6 Subjectc7 Subjectc8 Subjectd5 Subjectd6 Subjectd7
>> Subjectd8 Subjecte5 Subjecte6
>
> 1            1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 2            1             1         1         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 3            1             1         0         1         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 4            1             1         0         0         1         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 5            1             1         0         0         0         1
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 6            1             1         0         0         0         0
>>   1         0         0         0         0         0         0         0
>>       0         0
>
> 7            1             1         0         0         0         0
>>   0         1         0         0         0         0         0         0
>>       0         0
>
> 8            1             1         0         0         0         0
>>   0         0         1         0         0         0         0         0
>>       0         0
>
> 9            1             1         0         0         0         0
>>   0         0         0         1         0         0         0         0
>>       0         0
>
> 10           1             1         0         0         0         0
>>   0         0         0         0         1         0         0         0
>>       0         0
>
> 11           1             1         0         0         0         0
>>   0         0         0         0         0         1         0         0
>>       0         0
>
> 12           1             1         0         0         0         0
>>   0         0         0         0         0         0         1         0
>>       0         0
>
> 13           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         1
>>       0         0
>
> 14           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       1         0
>
> 15           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         1
>
> 16           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 17           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 18           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 19           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 20           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 21           1             1         0         0         0         0
>>   0         0         0         0         0         0         0         0
>>       0         0
>
> 22           1             0         0         0         0         0
>>   1         0         0         0         0         0         0         0
>>       0         0
>
> 23           1             0         0         0         0         0
>>   0         1         0         0         0         0         0         0
>>       0         0
>
> 24           1             0         0         0         0         0
>>   0         0         1         0         0         0         0         0
>>       0         0
>
> 25           1             0         0         0         0         0
>>   0         0         0         1         0         0         0         0
>>       0         0
>
> 26           1             0         0         0         0         0
>>   0         0         0         0         1         0         0         0
>>       0         0
>
> 27           1             0         0         0         0         0
>>   0         0         0         0         0         1         0         0
>>       0         0
>
> 28           1             0         0         0         0         0
>>   0         0         0         0         0         0         1         0
>>       0         0
>
> 29           1             0         0         0         0         0
>>   0         0         0         0         0         0         0         1
>>       0         0
>
>    Subjecte7 Subjecte8 Subjectf5 Subjectf6 Subjectf7 Subjectf8 TreatmentB
>> TreatmentC TreatmentD TreatmentE TreatmentF
>
> 1          0         0         0         0         0         0          0
>>          0          0          0          0
>
> 2          0         0         0         0         0         0          0
>>          0          0          0          0
>
> 3          0         0         0         0         0         0          1
>>          0          0          0          0
>
> 4          0         0         0         0         0         0          1
>>          0          0          0          0
>
> 5          0         0         0         0         0         0          1
>>          0          0          0          0
>
> 6          0         0         0         0         0         0          0
>>          1          0          0          0
>
> 7          0         0         0         0         0         0          0
>>          1          0          0          0
>
> 8          0         0         0         0         0         0          0
>>          1          0          0          0
>
> 9          0         0         0         0         0         0          0
>>          1          0          0          0
>
> 10         0         0         0         0         0         0          0
>>          0          1          0          0
>
> 11         0         0         0         0         0         0          0
>>          0          1          0          0
>
> 12         0         0         0         0         0         0          0
>>          0          1          0          0
>
> 13         0         0         0         0         0         0          0
>>          0          1          0          0
>
> 14         0         0         0         0         0         0          0
>>          0          0          1          0
>
> 15         0         0         0         0         0         0          0
>>          0          0          1          0
>
> 16         1         0         0         0         0         0          0
>>          0          0          1          0
>
> 17         0         1         0         0         0         0          0
>>          0          0          1          0
>
> 18         0         0         1         0         0         0          0
>>          0          0          0          1
>
> 19         0         0         0         1         0         0          0
>>          0          0          0          1
>
> 20         0         0         0         0         1         0          0
>>          0          0          0          1
>
> 21         0         0         0         0         0         1          0
>>          0          0          0          1
>
> 22         0         0         0         0         0         0          0
>>          1          0          0          0
>
> 23         0         0         0         0         0         0          0
>>          1          0          0          0
>
> 24         0         0         0         0         0         0          0
>>          1          0          0          0
>
> 25         0         0         0         0         0         0          0
>>          1          0          0          0
>
> 26         0         0         0         0         0         0          0
>>          0          1          0          0
>
> 27         0         0         0         0         0         0          0
>>          0          1          0          0
>
> 28         0         0         0         0         0         0          0
>>          0          1          0          0
>
> 29         0         0         0         0         0         0          0
>>          0          1          0          0
>
>

I'm also attaching the target file in case it adds any info.

Can you, perhaps, give any advice on this?

ps. I am not sure whether I should continue submitting to this thread or
open up a new one as this problem may not be related to the original
question I asked.

On Wed, Sep 3, 2014 at 12:12 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:

> Yes, that's all that is needed.
>
> In general, one corrects for a batch effect by fitting
>
>   ~ batch + otherterms
>
> where "otherterms" is what the model would have been without the batch
> effect.
>
> Best wishes
> Gordon
>
>
> On Mon, 1 Sep 2014, Nick N wrote:
>
>  Dea Gordon, Ryan and Nicolas,
>>
>> Than you all for the detailed advice.
>>
>> I have one more question regarding the blocking factor model. In my case I
>> have, actually, 2 external factors to consider - one is the platform, the
>> other one are the subjects.
>>
>> My sample matrix is the following (I've attached the CSV in case you can't
>> view the image):
>>
>>
>>
>>
>>
>> I am only interested in comparing treatments B:D to A (the latter are
>> controls). So far I've never had a model with more than one external
>> factor. I imagine it should be OK to have more - is this correct? If yes -
>> can you, perhaps, check whether I am setting the model matrix correctly?
>> (Apologies if this sounds too trivial) I imagine it shall be defined as:
>>
>> Platform <- factor(targets$Platform)
>>
>>> Subject  <- factor(targets$Subject)
>>> Treatment <- factor(targets$Treatment)
>>> design <- model.matrix(~Platform+Subject+Treatment)
>>>
>>
>> ..
>>
>>> fit <- glmFit(y, design)
>>> lrt <- glmLRT(fit, coef=24) # for comparing Treatment B to Treatment A
>>>
>>
>>
>> Is this correct?
>>
>>
>>
>>
>> On Sun, Aug 31, 2014 at 12:44 AM, Gordon K Smyth <smyth at wehi.edu.au>
>> wrote:
>>
>>  Dear Nick,
>>>
>>> If you go back to the post from 2010 that you give the URL for, you will
>>> see that I was giving very briefly the same advice about checking Poisson
>>> variability that Ryan has explained at greater detail.
>>>
>>> You don't give any information about read lengths, sequence depths or
>>> alignment methods.  I would be surprised if MiSeq and HiSeq would
>>> generate
>>> perfect Poisson replicates of one another, especially if the read lengths
>>> from the two platform are different or the alignment and counting
>>> software
>>> has been varied.  So you may well end up back at the blocking idea.
>>>
>>>
>>> Best wishes
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>> On Sun, 31 Aug 2014, Ryan wrote:
>>>
>>>  Thanks to the underlying theory behind dispersion estimation, you can
>>>
>>>> easily test whether your "technical replicates" really do represent
>>>> technical replicates. Specifically, read counts in technical replicates
>>>> should follow a Poisson distribution, which is a special case of the
>>>> negative binomial with zero dispersion. So, simply fit a model using
>>>> edgeR
>>>> or DESeq2 with a separate coefficient for each group of technical
>>>> replicates. Thus all the experimental variation will be absorbed into
>>>> the
>>>> model coefficients and the only thing left will be the technical
>>>> variability of of the replicates. For true technical replicates, the
>>>> dispersion should be zero for all genes. So if you estimate dispersions
>>>> using this model, and plotBCV/plotDispEsts shows the dispersion very
>>>> near
>>>> to zero, then you can be confident that you really have technical
>>>> replicates. If the dispersion is nonzero, then there is some additional
>>>> source of unaccounted-for variation.
>>>>
>>>> I have used this method on a pilot dataset with several technical
>>>> replicates for each condition. edgeR said the dispersion was something
>>>> like
>>>> 10^-3 or less for all genes except for the very low-expressed genes.
>>>>
>>>> -Ryan
>>>>
>>>> On 8/28/14, 9:23 AM, Nick N wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> I have a study where a fraction of the samples have been replicated on
>>>>> 2
>>>>> Illumina platforms (HiSeq and Miseq). These are technical replicates -
>>>>> the
>>>>> library preparation is the same using the same biological replicates -
>>>>> it's
>>>>> only the sequencing which is different.
>>>>>
>>>>> My hunch was that I shall introduce the platform as as an additional
>>>>> (blocking) factor in the analysis. Than I stumbled upon this post:
>>>>>
>>>>> https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html
>>>>>
>>>>> It recommends pooling the replicates. The post seems to apply to a
>>>>> different case ("pure" technical replicates, i.e. no differences in the
>>>>> sequencing platform used) so I probably shall ignore it. But I still
>>>>> feel a
>>>>> bit uncertain of the best way to treat the technical replicates. Can
>>>>> you,
>>>>> please, advise me on this?
>>>>>
>>>>> many thanks!
>>>>> Nick
>>>>>
>>>>
> ______________________________________________________________________
> The information in this email is confidential and intended solely for the
> addressee.
> You must not disclose, forward, print or use it without the permission of
> the sender.
> ______________________________________________________________________
>
-------------- next part --------------
FileName	Platform	Subject	Treatment
../A7/accepted_hits_clean.count	MiSeq	a7	A
../A8/accepted_hits_clean.count	MiSeq	a8	A
../B6/accepted_hits_clean.count	MiSeq	b6	B
../B7/accepted_hits_clean.count	MiSeq	b7	B
../B8/accepted_hits_clean.count	MiSeq	b8	B
../C5/accepted_hits_clean.count	MiSeq	c5	C
../C6/accepted_hits_clean.count	MiSeq	c6	C
../C7/accepted_hits_clean.count	MiSeq	c7	C
../C8/accepted_hits_clean.count	MiSeq	c8	C
../D5/accepted_hits_clean.count	MiSeq	d5	D
../D6/accepted_hits_clean.count	MiSeq	d6	D
../D7/accepted_hits_clean.count	MiSeq	d7	D
../D8/accepted_hits_clean.count	MiSeq	d8	D
../E5/accepted_hits_clean.count	MiSeq	e5	E
../E6/accepted_hits_clean.count	MiSeq	e6	E
../E7/accepted_hits_clean.count	MiSeq	e7	E
../E8/accepted_hits_clean.count	MiSeq	e8	E
../F5/accepted_hits_clean.count	MiSeq	f5	F
../F6/accepted_hits_clean.count	MiSeq	f6	F
../F7/accepted_hits_clean.count	MiSeq	f7	F
../F8/accepted_hits_clean.count	MiSeq	f8	F
../SC5/accepted_hits_clean.count	HiSeq	c5	C
../SC6/accepted_hits_clean.count	HiSeq	c6	C
../SC7/accepted_hits_clean.count	HiSeq	c7	C
../SC8/accepted_hits_clean.count	HiSeq	c8	C
../SD5/accepted_hits_clean.count	HiSeq	d5	D
../SD6/accepted_hits_clean.count	HiSeq	d6	D
../SD7/accepted_hits_clean.count	HiSeq	d7	D
../SD8/accepted_hits_clean.count	HiSeq	d8	D