[BioC] [DIFFBIND] batch effects and blocking factors

Giuseppe Gallone giuseppe.gallone at dpag.ox.ac.uk
Wed Jun 18 20:39:03 CEST 2014


Hi

I have a group of samples for which I'd like to ascertain if 
differential binding is detectable based on a "condition" binary 
variable (stored in DBA_CONDITION).

However, these samples have been processed in 4 batches (each batch has 
at least 3 samples).  I would like to run a multifactorial analysis to 
regress the batch effect first, and then possibly analyse any remaining 
variance across the DBA_CONDITION contrast of interest.

I understand it is possible to run such an analysis using blocking 
factors in dba.contrast. Let's say I store the 4 batch labels in 
DBA_TISSUE. The following:

data = dba.contrast(data, categories=DBA_CONDITION, block=DBA_TISSUE)

returns the following warning messages:

Warning messages:
1: Blocking factor invalid for all contrasts:
2: No blocking values are present in both groups

and data will not contain blocking factor information.

Am I wrong in thinking that multiple contrasts can be used for the 
"block" argument? If I use only one contrast via mask (for example 
BATCH_1 VS !BATCH_1) this works correctly:

data = dba.contrast(data, categories=DBA_CONDITION, 
block=data$masks$BATCH_1)

however it will only block variance due to to this particular contrast, 
not all of them.

A solution is, I suppose, do a differential analysis on all the 
contrasts one wishes to block, and identify the one which produces the 
highest number of variant sites:

data = dba.contrast(data, categories=DBA_TISSUE)
dba.analyze(data)
...
#pick the contrast with the highest variance, eg BATCH_4, then do:

data = dba.contrast(data, categories=DBA_CONDITION, 
block=data$masks$BATCH_4)

However I was still wondering if there is a way to model all the 
variance due to the batch effects at once and the look at the residual 
variance for the real analysis.

Thanks!
Giuseppe



More information about the Bioconductor mailing list