[BioC] ComBat: 3 adjustment variables & continuous adjustment variables

James W. MacDonald jmacdon at uw.edu
Wed Mar 19 14:58:03 CET 2014


Hi Magda,

I'm not sure you need to do things sequentially like that. From what I 
can tell, you should just be able to do

mod <- model.matrix(~tissue, des)
bat <- ComBat(data, des[,c("plate","row","chip")], mod)

And go from there.

Best,

Jim


On 3/18/2014 6:04 PM, Magda Price wrote:
> Hi Jim,
>
> Re numCovs - what you've stated was how I interpreted the use as well, 
> which is why I didn't think it would helpful.
>
> As usual with these types of human disease datasets, the study design 
> is not ideal, and more complicated than I initially let on! The 180 
> samples are a combination of 3 phenotype groups (1 control + 2 
> diseased) and 5 different tissues. Other samples, unrelated to this 
> project were also run on these chips, which is why I'm working with 
> less samples than the total that were run (which was 288).
>
> Here's a simplified version of what my ComBat code looks like:
>
> #1 - correct for plate effect
> mod.1<- model.matrix(~tissue+group+row+chip, data=des)
> bat.1<- ComBat(data, des$plate, mod.1)
>
> #2 - correct for row effect
> mod.2<-model.matrix(~tissue+group+chip, data=des)
> bat.2<-ComBat(data=bat.1, des$row, mod.2)
>
> #3 - correct for chip
> mod.3<-model.matrix(~tissue+group,data=des)
> bat.3<-ComBat(data=bat.2, des$chip,mod.3)
> We know from some pilot studies that the effect size (i.e. 
> differential methylation between disease vs control samples in a give 
> tissue) is small, so I am concerned about being thorough in the batch 
> correction. I'm new to batch correction and you've correctly 
> understood my concern about the row effect; so it sounds to me that 
> how I have modeled the effect in the code above (i.e. each batch 
> variable as a factor) was correct. Any corrections/suggestions for 
> what I've done above?
>
> Thanks!
>
>
> On Tue, Mar 18, 2014 at 2:27 PM, James W. MacDonald <jmacdon at uw.edu 
> <mailto:jmacdon at uw.edu>> wrote:
>
>     Hi Magda,
>
>     The numCovs argument won't work because that is simply used to
>     specify columns in the model matrix (of non-batch things you want
>     to fit in your linear model) that are continuous covariates rather
>     than fixed effects. It has nothing to do with correcting for the
>     batch effect.
>
>     And I think you might be thinking about batch effects in the wrong
>     way. If you fit a 'row' effect, then what you are saying is that
>     on average, the measures you get from one row differ from the
>     measures you get from another row. So as an example, row 1 might
>     tend to have higher values because those arrays don't get washed
>     as well, whereas rows 3 and 4 might be dimmer because they get
>     washed more. You then want to estimate how much brighter on
>     average, the row1 chips are (and how much dimmer the row 3 and 4
>     chips are), and adjust the observed data to account for this.
>
>     But you do the estimation of these averages using factors, rather
>     than continuous measures (because a chip either is or is not in
>     row 1).
>
>     You might just be over-thinking this. I don't see how 3 plates of
>     24 chips gets you to 180 samples, but regardless it seems like you
>     have enough replication to estimate the batch effects, and still
>     have enough degrees of freedom left over for your comparisons,
>     unless you have some huge number of phenotypic combinations that
>     you are trying to compare (do you?).
>
>     Best,
>
>     Jim
>
>
>
>
>     On Tuesday, March 18, 2014 2:13:11 PM, Magda Price wrote:
>
>         Hi Jim,
>
>         I have several different "batch" variables - one for example
>         is the
>         chip that each sample was run on (there are 24 of these) and I
>         think
>         chip batch should definitely be treated as a factor. Another
>         "batch"
>         variable I would like to adjust for is the position the sample
>         was run
>         on the chip (there are 6 different rows). If I use row as a
>         factor,
>         then the effect of being in row 1 vs 2 is treated the same as the
>         effect of 1 vs 6, but the bias I see changes step-wise from
>         row 1, 2,
>         3, 4, 5, 6 thus I thought that treating row as a numeric or
>         integer
>         variable would better model the "batch" effect. In other words row
>         batches have meaning relative to each other whereas chip
>         batches do not.
>
>         I guess this would be another reason why using the numCovs option
>         (continuous not integer) might not work in my case?!
>
>         Hope that explains things a bit better! Happy to provide any
>         more info
>         & I really appreciate the input.
>
>         Magda
>
>
>         On Tue, Mar 18, 2014 at 10:51 AM, James W. MacDonald
>         <jmacdon at uw.edu <mailto:jmacdon at uw.edu>
>         <mailto:jmacdon at uw.edu <mailto:jmacdon at uw.edu>>> wrote:
>
>             Hi Magda,
>
>             I'm curious. How can one specify a batch using a continuous
>             variable? In other words, isn't a particular sample in a
>         batch or not?
>
>             Best,
>
>             Jim
>
>
>
>             On 3/18/2014 1:44 PM, Magda Price wrote:
>
>                 Hi Steve,
>
>                 Thanks for your advice. I do know that I'm using an old
>                 version of R (one
>                 of the packages I'm using requires it) however, the
>         options
>                 you mention
>                 from sva are in fact available in the older version as
>         well,
>                 but it wasn't
>                 clear to me how to use them.
>
>                 I've copied the usage and argument information for the
>         ComBat
>                 function
>                 below, maybe you can help clarify:
>
>                 *ComBat(dat, batch, mod, numCovs=NULL,
>                 par.prior=TRUE,prior.plots=__FALSE)*
>
>
>                 *dat Genomic measure matrix (dimensions probe x
>         sample) - for
>                 example,
>                 expression matrix*
>
>                 *batch   Batch covariate (multiple batches allowed)*
>
>                 *mod Model matrix for outcome of interest and other
>         covariates
>                 besides
>                 batch*
>
>                 *numCovs (Optional) Vector containing the column
>         numbers of
>                 the continuous
>
>                 covariates in the model matrix, or NULL if no continuous
>                 covariates are
>                 used*
>
>                 *par.prior (Optional) TRUE indicates parametric
>         adjustments
>                 will be used,
>                 FALSE indicates non-parametric adjustments will be used*
>                 *prior.plots (Optional) TRUE give prior plots with
>         black as a
>                 kernel
>
>                 estimate of the empirical batch effect density and red
>         as the
>                 parametric
>                 estimate*
>
>
>                 The model matrix is supposed to contain the outcome of
>                 interest and other
>                 covariates *besides batch*, but batch is what I need
>         to be a
>                 continuous
>                 variable. numCovs seems to allow me to specify
>         *covariates*
>                 that should be
>                 continuous, but not *adjustment variables*. What am I
>         missing?
>
>
>                 Thanks again!
>
>
>
>                 On Tue, Mar 18, 2014 at 9:48 AM, Steve Lianoglou
>                 <lianoglou.steve at gene.com
>         <mailto:lianoglou.steve at gene.com>
>                 <mailto:lianoglou.steve at gene.com
>         <mailto:lianoglou.steve at gene.com>>>__wrote:
>
>
>                     Hi Magda,
>
>                     You are using a version of R (2.14) that is
>         horribly out
>                     of date, and
>                     as a result your bioconductor packages are frozen to
>                     versions that are
>                     quite old.
>
>                     Please update to the latest version of R (3.0.3) and
>                     reinstall your
>                     bioconductor packages using biocLite to ensure
>         that you
>                     are running
>                     the the latest version of them.
>
>                     The package you are version (sva v3.0.2) is now at
>         version
>                     3.8.0.
>
>                     One question you asked:
>
>                         - Row would be better treated as a continuous
>                         adjustment variable than a
>
>                     factor. In the version of sva that I am using
>         (3.0.2) I
>                     believe that only
>                     factor adjustment variables are supported. I have seen
>                     mention in a few
>                     forums that there might be an update to ComBat to
>         adjust
>                     for a numeric
>                     batch variable, is one available?
>
>                     Is readily answered by reading through the
>         vignette for
>                     the current
>                     version of the package:
>
>
>         http://bioconductor.org/__packages/release/bioc/__vignettes/sva/inst/doc/sva.pdf
>
>
>                    
>         <http://bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf>
>
>                     Specifically in Section 7 (Applying the ComBat
>         function to
>                     adjust for
>                     known batches), where it states:
>
>                     """
>                     By default, all adjustment variables will be
>         treated as factor
>                     variables by the ComBat function. If you would
>         like to include
>                     continuous adjustment variables, also create a vector
>                     containing the
>                     column numbers of the continuous covariates in the
>         model
>                     matrix. This
>                     vector must then be input into ComBat via the
>         numCovs option.
>                     """
>
>                     HTH,
>
>                     -steve
>
>                     --
>                     Steve Lianoglou
>                     Computational Biologist
>                     Genentech
>
>
>
>
>             --
>             James W. MacDonald, M.S.
>             Biostatistician
>             University of Washington
>             Environmental and Occupational Health Sciences
>             4225 Roosevelt Way NE, # 100
>             Seattle WA 98105-6099
>
>
>
>
>         --
>         E. Magda Price
>         PhD Candidate, Robinson Lab
>         University of British Columbia
>
>         CFRI Room 2071
>         950 West 28th Ave.
>         Vancouver BC., V5Z 4H4
>         (604)-875-3015 <tel:%28604%29-875-3015>
>
>
>     --
>     James W. MacDonald, M.S.
>     Biostatistician
>     University of Washington
>     Environmental and Occupational Health Sciences
>     4225 Roosevelt Way NE, # 100
>     Seattle WA 98105-6099
>
>
>
>
> -- 
> E. Magda Price
> PhD Candidate, Robinson Lab
> University of British Columbia
>
> CFRI Room 2071
> 950 West 28th Ave.
> Vancouver BC., V5Z 4H4
> (604)-875-3015

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list