[BioC] ComBat: 3 adjustment variables & continuous adjustment variables

Tue Mar 18 22:27:58 CET 2014

Hi Magda,

The numCovs argument won't work because that is simply used to specify 
columns in the model matrix (of non-batch things you want to fit in 
your linear model) that are continuous covariates rather than fixed 
effects. It has nothing to do with correcting for the batch effect.

And I think you might be thinking about batch effects in the wrong way. 
If you fit a 'row' effect, then what you are saying is that on average, 
the measures you get from one row differ from the measures you get from 
another row. So as an example, row 1 might tend to have higher values 
because those arrays don't get washed as well, whereas rows 3 and 4 
might be dimmer because they get washed more. You then want to estimate 
how much brighter on average, the row1 chips are (and how much dimmer 
the row 3 and 4 chips are), and adjust the observed data to account for 
this.

But you do the estimation of these averages using factors, rather than 
continuous measures (because a chip either is or is not in row 1).

You might just be over-thinking this. I don't see how 3 plates of 24 
chips gets you to 180 samples, but regardless it seems like you have 
enough replication to estimate the batch effects, and still have enough 
degrees of freedom left over for your comparisons, unless you have some 
huge number of phenotypic combinations that you are trying to compare 
(do you?).

Best,

Jim

On Tuesday, March 18, 2014 2:13:11 PM, Magda Price wrote:
> Hi Jim,
>
> I have several different "batch" variables - one for example is the
> chip that each sample was run on (there are 24 of these) and I think
> chip batch should definitely be treated as a factor. Another "batch"
> variable I would like to adjust for is the position the sample was run
> on the chip (there are 6 different rows). If I use row as a factor,
> then the effect of being in row 1 vs 2 is treated the same as the
> effect of 1 vs 6, but the bias I see changes step-wise from row 1, 2,
> 3, 4, 5, 6 thus I thought that treating row as a numeric or integer
> variable would better model the "batch" effect. In other words row
> batches have meaning relative to each other whereas chip batches do not.
>
> I guess this would be another reason why using the numCovs option
> (continuous not integer) might not work in my case?!
>
> Hope that explains things a bit better! Happy to provide any more info
> & I really appreciate the input.
>
> Magda
>
>
> On Tue, Mar 18, 2014 at 10:51 AM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
>     Hi Magda,
>
>     I'm curious. How can one specify a batch using a continuous
>     variable? In other words, isn't a particular sample in a batch or not?
>
>     Best,
>
>     Jim
>
>
>
>     On 3/18/2014 1:44 PM, Magda Price wrote:
>
>         Hi Steve,
>
>         Thanks for your advice. I do know that I'm using an old
>         version of R (one
>         of the packages I'm using requires it) however, the options
>         you mention
>         from sva are in fact available in the older version as well,
>         but it wasn't
>         clear to me how to use them.
>
>         I've copied the usage and argument information for the ComBat
>         function
>         below, maybe you can help clarify:
>
>         *ComBat(dat, batch, mod, numCovs=NULL,
>         par.prior=TRUE,prior.plots=__FALSE)*
>
>         *dat Genomic measure matrix (dimensions probe x sample) - for
>         example,
>         expression matrix*
>
>         *batch   Batch covariate (multiple batches allowed)*
>
>         *mod Model matrix for outcome of interest and other covariates
>         besides
>         batch*
>
>         *numCovs (Optional) Vector containing the column numbers of
>         the continuous
>
>         covariates in the model matrix, or NULL if no continuous
>         covariates are
>         used*
>
>         *par.prior (Optional) TRUE indicates parametric adjustments
>         will be used,
>         FALSE indicates non-parametric adjustments will be used*
>         *prior.plots (Optional) TRUE give prior plots with black as a
>         kernel
>
>         estimate of the empirical batch effect density and red as the
>         parametric
>         estimate*
>
>
>         The model matrix is supposed to contain the outcome of
>         interest and other
>         covariates *besides batch*, but batch is what I need to be a
>         continuous
>         variable. numCovs seems to allow me to specify *covariates*
>         that should be
>         continuous, but not *adjustment variables*. What am I missing?
>
>
>         Thanks again!
>
>
>
>         On Tue, Mar 18, 2014 at 9:48 AM, Steve Lianoglou
>         <lianoglou.steve at gene.com
>         <mailto:lianoglou.steve at gene.com>>__wrote:
>
>             Hi Magda,
>
>             You are using a version of R (2.14) that is horribly out
>             of date, and
>             as a result your bioconductor packages are frozen to
>             versions that are
>             quite old.
>
>             Please update to the latest version of R (3.0.3) and
>             reinstall your
>             bioconductor packages using biocLite to ensure that you
>             are running
>             the the latest version of them.
>
>             The package you are version (sva v3.0.2) is now at version
>             3.8.0.
>
>             One question you asked:
>
>                 - Row would be better treated as a continuous
>                 adjustment variable than a
>
>             factor. In the version of sva that I am using (3.0.2) I
>             believe that only
>             factor adjustment variables are supported. I have seen
>             mention in a few
>             forums that there might be an update to ComBat to adjust
>             for a numeric
>             batch variable, is one available?
>
>             Is readily answered by reading through the vignette for
>             the current
>             version of the package:
>
>
>             http://bioconductor.org/__packages/release/bioc/__vignettes/sva/inst/doc/sva.pdf
>             <http://bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf>
>
>             Specifically in Section 7 (Applying the ComBat function to
>             adjust for
>             known batches), where it states:
>
>             """
>             By default, all adjustment variables will be treated as factor
>             variables by the ComBat function. If you would like to include
>             continuous adjustment variables, also create a vector
>             containing the
>             column numbers of the continuous covariates in the model
>             matrix. This
>             vector must then be input into ComBat via the numCovs option.
>             """
>
>             HTH,
>
>             -steve
>
>             --
>             Steve Lianoglou
>             Computational Biologist
>             Genentech
>
>
>
>
>     --
>     James W. MacDonald, M.S.
>     Biostatistician
>     University of Washington
>     Environmental and Occupational Health Sciences
>     4225 Roosevelt Way NE, # 100
>     Seattle WA 98105-6099
>
>
>
>
> --
> E. Magda Price
> PhD Candidate, Robinson Lab
> University of British Columbia
>
> CFRI Room 2071
> 950 West 28th Ave.
> Vancouver BC., V5Z 4H4
> (604)-875-3015

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099