[BioC] Covariate for batch effect removal by ComBat

Johnson, William Evan wej at bu.edu
Tue Jun 11 15:02:38 CEST 2013


Atul, 

The way your design looks, it seems that your experimental conditions are confounded with batch. At this point, you will need to make some assumptions to get ComBat working correctly. What should be the difference between P4 and P8? Would P20 and P30 be that different? What about P42 and P52? Can you assume any of these to be the same? 

Note, I'm happy to discuss this off the mailing list if you don't want to tell everyone your experimental conditions, but because your design is confounded, you really need to think carefully about how you apply ComBat and in what assumptions you make.

Thanks!

Evan


On Jun 11, 2013, at 6:00 AM, <bioconductor-request at r-project.org>
 <bioconductor-request at r-project.org> wrote:

> Message: 12
> Date: Mon, 10 Jun 2013 13:42:52 -0400
> From: Atul Kakrana <atulkakrana at outlook.com>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] Covariate for batch effect removal by ComBat
> Message-ID: <BLU0-SMTP108CD993F824AF0373B7432AD840 at phx.gbl>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi Everybody,
> 
> I am analysing Illumina micro-array data and seem to have batch effects
> (plots attached) in my data. For batch effect removal I am using Combat
> from 'sva' package. This is my sample info file:
> 
> Array.name    Sample    Stage    Condition    Batch
> P4_A    1    P4    Test    1
> P4_B    2    P4    Test    1
> P4_C    3    P4    Test    1
> P30_A    4    P30    Test    1
> P30_B    5    P30    Test    1
> P12_A    6    P12    Test    2
> P12_B    7    P12    Test    2
> P52_A    8    P52    Test    2
> P52_B    9    P52    Test    2
> CON_A    10    Mix    Con    2
> CON_B    11    Mix    Con    2
> P8_A    12    P8    Test    2
> P8_B    13    P8    Test    2
> P20_A    14    P20    Test    2
> P20_B    15    P20    Test    2
> P42_A    16    P42    Test    2
> P42_B    17    P42    Test    2
> 
> 
> The data is from a time-series experiment and numbers in 'Array.name'
> correspond to age at which samples harvested. None of the time point is
> repeated again in any of the batch For ex. P4 is in batch 1 and never
> repeated again. I have few questions about implementation of ComBat.
> 
> 1. Which column should be used for co-variates. I am confused between
> 'Stage' and 'Condition'? Or should I use 'Condition' as covariates and
> 'Stage' as Continuous variables (numCovs)?
> 
> 2. The adjustment, parametric or non-parametric?
> 
> Here is my Code:
> 
> IL.pheno <- read.table('PhenoData.csv', sep =',', header = T)##
> PhenoData is same as sample info above
> batch = IL.pheno$Batch
> edata <- exprs(esetLumi.Reduced.AB)
> mod = model.matrix(~as.factor(Condition), data=IL.pheno)
> combat_edata = ComBat(dat=edata, batch=batch, mod=mod, numCovs=NULL,
> par.prior=TRUE, prior.plots = TRUE)
> 
> ##Fitting back to expression set
> exprs(esetLumi.Reduced.AB) <- combat_edata
> 
> 
> I appreciate your help.
> 
> Best
> 
> AK



More information about the Bioconductor mailing list