[BioC] ComBat:covariates or additional batch

Hedi Peterson [guest] guest at bioconductor.org
Fri Jan 25 12:42:00 CET 2013


I have a question regarding Batch and Covariates in ComBat. I have an Illumina expression dataset of control versus treatment, at 3 different time points (day3,5,7) and 4 biological replicates each. However RNA from different experimental groups (replicates) was extracted at different time points and by default samples cluster together based on the RNA extraction date therefore I have a strong batch effect.

Should I use RNA extraction date as Batch and both the biological replicate groups and Time_x_Treatment as covariates (like the sample file is shown below) or apply it in any other way (using extraction date as first batch and then re-run with replicate groups as second batch)? 

Second question, is it correct to subgroup the covariate as Time_x_Treatment or should I just have control vs treatment (and not specify the day factor)?


Array	Sample	Batch	Covariate	Covariate2
D3C4	day3control_repl4	1	day3control	1
D3C3	day3control_repl3	2	day3control	3
D3C2	day3control_repl2	3	day3control	4
D3C1	day3control_repl1	3	day3control	2
D5C4	day5control_repl4	2	day5control	3
D5C3	day5control_repl3	1	day5control	1
D5C2	day5control_repl2	1	day5control	4
D5C1	day5control_repl1	4	day5control	2
D7C4	day7control_repl4	5	day7control	4
D7C3	day7control_repl3	4	day7control	3
D7C2	day7control_repl2	2	day7control	2
D7C1	day7control_repl1	1	day7control	1
D3T4	day3treatment_repl4	1	day3treatment	1
D3T3	day3treatment_repl3	4	day3treatment	3
D3T2	day3treatment_repl2	1	day3treatment	4
D3T1	day3treatment_repl1	6	day3treatment	2
D5T4	day5treatment_repl4	5	day5treatment	3
D5T3	day5treatment_repl3	4	day5treatment	1
D5T2	day5treatment_repl2	6	day5treatment	4
D5T1	day5treatment_repl1	6	day5treatment	2
D7T4	day7treatment_repl4	2	day7treatment	4
D7T3	day7treatment_repl3	4	day7treatment	3
D7T2	day7treatment_repl2	5	day7treatment	2
D7T1	day7treatment_repl1	2	day7treatment	1

 -- output of sessionInfo(): 

R version 2.15.2 (2012-10-26)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] splines   stats4    graphics  grDevices utils     datasets  grid      stats     methods   base     

other attached packages:
 [1] mgcv_1.7-22           corpcor_1.6.4         RColorBrewer_1.0-5    panp_1.28.1           hwriter_1.3           R2HTML_2.2            reshape_0.8.4         plyr_1.7.1            gProfileR_0.2         fpc_2.1-5            
[11] flexmix_2.3-8         multcomp_1.2-14       survival_2.36-14      mvtnorm_0.9-9993      modeltools_0.2-19     lattice_0.20-10       mclust_4.0            MASS_7.3-22           cluster_1.14.3        preprocessCore_1.20.0
[21] affy_1.36.0           Biobase_2.18.0        BiocGenerics_0.4.0    limma_3.14.1          pheatmap_0.7.4        gridExtra_0.9.1       ggplot2_0.9.2.1      


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list