[BioC] ComBat: 2 adjustment variables & continuous adjustment variables

Johnson, William Evan wej at bu.edu
Thu Feb 20 22:48:11 CET 2014


Hey Magda, 

The two-step method is still a reasonable approach. It has worked well for me in multiple situations. I do have a beta version of a ComBat version that will handle two batch variables at the same time. It works well in theory--but I have yet to test it thoroughly across multiple datasets. I'm willing to share the code if you want to test it on your data (let me know).

ComBat in the sva package can handle numeric covariates, but it does not deal with continuous batch variables. Adjusting the mean of a continuous batch variable would be straight-forward (assuming a linear effect), but the variance adjustment would be very tricky. 

Ultimately, since the two-step approach seems to have worked, I think your best option is to just move forward with those results. 

Thanks!

Evan



On Feb 19, 2014, at 4:00 AM, <bioconductor-request at r-project.org>
 <bioconductor-request at r-project.org> wrote:

> Message: 23
> Date: Tue, 18 Feb 2014 16:45:12 -0800
> From: Magda Price <magdaprice at gmail.com>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] ComBat: 2 adjustment variables & continuous adjustment
> 	variables
> Message-ID:
> 	<CADkR4V=ydd1abJXFhtd+Xwq8MZMP_=urHVDtPXOTurPQjzB7Tg at mail.gmail.com>
> Content-Type: text/plain
> 
> Hi!
> 
> I'm writing with a few questions about applying ComBat (sva package) to a
> set of ~50 samples run on the the Illumina Infinium HumanMethylation450
> BeadChip array (~450,000 DNA methylation data points).
> 
> There is a large amount of variation in my data due to both the batch the
> samples were run in (3 different batches), in addition to the position they
> were located on the chip - specifically the row (6 different rows), but not
> the column. The chips are set up in a 6 row * 2 column format like this:
> 
> 
> sample 01   sample 02
> sample 03   sample 04
> sample 05   sample 06
> sample 07   sample 08
> sample 09   sample 10
> sample 11   sample 12
> 
> 
> I read Dr. Evan Johnson's suggestions to someone else with this
> "2-batch-effect-variable" problem in the ComBat google group (
> https://groups.google.com/forum/#!topic/combat-user-forum/PcTxNlaUmAI). He
> had 2 good suggestions:
> 
>   1. Combine the two batch variables into one, if 3-4 reps are left in
>   each batch
>   2. Use ComBat twice, adjusting for the first batch using the second
>   batch as a covariate, and then adjust for the second batch.
> 
> I cannot go with the first suggestion because combining the 2 batch
> variables would create 18 batch categories (3 batches * 6 rows), and I
> would not have enough replicates per batch category.
> 
> So I tried the second option - applying ComBat twice. I first corrected for
> row and then took the row-corrected data and applied ComBat again,
> correcting for batch. It seems to have worked & the correlation of my
> technical replicates improves. I am seeking advice on two points:
> 
>   1. The google group post is now a few years old, is it still thought
>   that the step-wise correction is a valid approach?
>   2. Row would be better treated as a continuous adjustment variable than
>   a factor. In the version of sva that I am using (3.0.2) I believe that only
>   factors adjustment variables are supported. I have seen mention in a few
>   forums that there might be an update to ComBat to adjust for a numeric
>   batch variable, is one available?
> 
> Thank you in advanced for your help!
> 
> Magda Price,
> University of British Columbia



More information about the Bioconductor mailing list