[BioC] Different number of differentially expressed genes after using ComBat in 'sva' for batch correction

Naomi Altman naomi at stat.psu.edu
Sat Jun 15 09:29:07 CEST 2013

There are several possibilities about why this happened, but one is 
power.  Limma (and all ANOVA routines) uses the MSE computed from all 
the groups to determine differences among groups.  Since Batch 2 is 
very small, you did not have a good measure of MSE in the analysis 
that included only Batch 2.  When you combine samples, you have a 
much better measure and many more d.f. for error and so much more 
power.  If it also happens that Batch 2 was a bit more variable than 
Batch 1, you will also have a smaller MSE after combining.  Finally, 
you now have more measurements for Group2 which means that any 
comparison involving group 2 will be much more powerful.

--Naomi Altman

At 10:48 AM 6/14/2013, Michaela Oswald wrote:
>I have a question about concerning the number of differentially expressed
>probes after batch combination, using ComBat from 'sva'.
>I have 2 data sets: one containing around 250 samples that correspond to
>around 50 groups, another one containing 10 samples corresponding to 2
>groups (let me call them Batch2_Group1, Batch2_Group2). One of the 2 group
>labels in the second batch (Batch2_Group2) also exists in the first batch,
>so there is no confounding situation here.
>Before batch correction the 2 data sets cluster by batch, not by group.
>I used ComBat from the R/Bioconductor package 'sva' to correct for this,
>using a model matrix to accommodate the overlapping groups between the 2
>batches and setting par.prior=TRUE, i.e. using parametric adjustment.
>After the batch correction the samples cluster perfectly by group, not by
>batch any longer.
>I do notice, however, that the number of differentially expressed probes
>between Batch2_Group1 and Batch2_Group2 changes dramatically with data
>combination. Within Batch2 alone I have around 1000 differentially
>expressed probes, around 50% up- and down-regulated each. After data
>combination I have around 3000 differentially expressed probes, ~2000 up
>and ~1000 down in the group comparison. (I use 'limma' for differential
>It seems that ComBat pulled the groups Batch2_Group1 and Batch2_Group2
>further apart from each other. The group that did not have a group label
>match in Batch1 is now much more up-regulated.
>Is there a way to adjust the data combination so I can keep the number of
>differentially expressed probes similar to what it was before?
>Thank you,
>         [[alternative HTML version deleted]]
>Bioconductor mailing list
>Bioconductor at r-project.org
>Search the archives: 

More information about the Bioconductor mailing list