[BioC] [DIFFBIND] batch effects and blocking factors

Giuseppe Gallone giuseppe.gallone at dpag.ox.ac.uk
Wed Jun 18 20:39:03 CEST 2014


I have a group of samples for which I'd like to ascertain if 
differential binding is detectable based on a "condition" binary 
variable (stored in DBA_CONDITION).

However, these samples have been processed in 4 batches (each batch has 
at least 3 samples).  I would like to run a multifactorial analysis to 
regress the batch effect first, and then possibly analyse any remaining 
variance across the DBA_CONDITION contrast of interest.

I understand it is possible to run such an analysis using blocking 
factors in dba.contrast. Let's say I store the 4 batch labels in 
DBA_TISSUE. The following:

data = dba.contrast(data, categories=DBA_CONDITION, block=DBA_TISSUE)

returns the following warning messages:

Warning messages:
1: Blocking factor invalid for all contrasts:
2: No blocking values are present in both groups

and data will not contain blocking factor information.

Am I wrong in thinking that multiple contrasts can be used for the 
"block" argument? If I use only one contrast via mask (for example 
BATCH_1 VS !BATCH_1) this works correctly:

data = dba.contrast(data, categories=DBA_CONDITION, 

however it will only block variance due to to this particular contrast, 
not all of them.

A solution is, I suppose, do a differential analysis on all the 
contrasts one wishes to block, and identify the one which produces the 
highest number of variant sites:

data = dba.contrast(data, categories=DBA_TISSUE)
#pick the contrast with the highest variance, eg BATCH_4, then do:

data = dba.contrast(data, categories=DBA_CONDITION, 

However I was still wondering if there is a way to model all the 
variance due to the batch effects at once and the look at the residual 
variance for the real analysis.


