[BioC] ComBat: Could it utilize technical replicates?

Johnson, William Evan wej at bu.edu
Sun Aug 11 01:57:04 CEST 2013

Hi Essi, 

Yes, ComBat can definitely utilize this information. Just replace your current 'Covariate 1' with a covariate that just has the sample letter (e.g. A, B, C, C, D, D, E, ... ). Note that this will be sufficient because your 'Covariate 1' is nested within sample letter. Under this setup, ComBat will preserve all variation due to sample type (and as a result risk level) and effectively just use the repeated samples to adjust for batch.

Hope this helps. Thanks!


On Aug 9, 2013, at 8:23 AM, Essi Laajala wrote:

> Hi,
> I'm dealing with quite an unusual study design. Originally (due to unfortunate and inevitable circumstances) we had all "high_risk" and "medium_risk" samples on batch 1 and "low_risk" samples on batch 2. Then we discussed the batch effect and decided to re-hybridize some randomly selected samples from each risk group on batch 3. The resulting study design looks a bit like like this (in reality we have 30 - 45 samples in each group and 16 samples re-hybridized but you'll get an idea):
> Array  name        Batch        Covariate  1
> sample_A        1        High_risk
> sample_B        1        High_risk
> sample_C        1        High_risk
> sample_C_2     3        High_risk
> sample_D        1        High_risk
> sample_D_2    3        High_risk
> sample_E        1        Medium_risk
> sample_F        1        Medium_risk
> sample_G        1        Medium_risk
> sample_G_2     3        Medium_risk
> sample_H        2        Low_risk
> sample_I        2        Low_risk
> sample_J        2        Low_risk
> sample_J_2     3        Low_risk
> sample_K        2        Low_risk
> sample_K_2     3        Low_risk
> For example Sample_C and Sample_C_2 are the same RNA sample and the only difference between them is the batch (the same applies to D and D_2 etc.). Such array pairs should be valuable for estimating batch effects. The question is: Can ComBat utilize this information? Or can you recommend some other batch correction method that could? For now, I've applied ComBat after removing the replicated samples on batches 1 and 2 (C, D, G, J and K in the above example) but this is certainly not an optimal solution.
> Best regards,
> Essi Laajala
> PhD student in bioinformatics
> Turku, Finland

More information about the Bioconductor mailing list