[BioC] ComBat: 3 adjustment variables & continuous adjustment variables

Magda Price [guest] guest at bioconductor.org
Tue Mar 18 17:15:46 CET 2014


I'm writing with a few questions about applying ComBat (sva package) to a set of ~180 samples run on the the Illumina Infinium HumanMethylation450 BeadChip array (~450,000 DNA methylation data points). 

There is a large amount of variation in my data due to the plate the samples were run on (3 different plates), the chip they were run on (24 different chips) and the position they were located on the chip - specifically the row (6 different rows). The chips are set up in a 6 row * 2 column format like this:

sample 01   sample 02
sample 03   sample 04
sample 05   sample 06
sample 07   sample 08
sample 09   sample 10
sample 11   sample 12

I read Dr. Evan Johnson's suggestions to someone else with this "multiple-batch-effect-variable" problem in the ComBat google group (https://groups.google.com/forum/#!topic/combat-user-forum/PcTxNlaUmAI). He had 2 suggestions:

- Combine the two batch variables into one, if 3-4 reps are left in each batch

- Use ComBat multiple times, adjusting for the first batch using the other batch variables as covariates, and then adjust for the second batch, and so on

I cannot go with the first suggestion because combining the batch variables would create too many categories and I would not have enough replicates per batch category. 

I am seeking advice on the points:

- The google group post is now a few years old, is it still thought that the step-wise correction is a valid approach? 

- The google group post also was asking about adjusting for 2, not 3 batch variables, does this concern anyone more if I apply ComBat 3 times?

- Row would be better treated as a continuous adjustment variable than a factor. In the version of sva that I am using (3.0.2) I believe that only factor adjustment variables are supported. I have seen mention in a few forums that there might be an update to ComBat to adjust for a numeric batch variable, is one available?

Thank you in advanced for your help!

Magda Price, UBC

 -- output of sessionInfo(): 

R version 2.14.0 (2011-10-31)
Platform: x86_64-pc-mingw32/x64 (64-bit)

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sva_3.0.2                             mgcv_1.7-22                           corpcor_1.6.4                         wateRmelon_1.2.2                     
 [5] IlluminaHumanMethylation450k.db_1.4.6 org.Hs.eg.db_2.6.4                    RSQLite_0.11.2                        DBI_0.2-5                            
 [9] AnnotationDbi_1.16.19                 matrixStats_0.6.2                     ROC_1.30.0                            limma_3.10.3                         
[13] RColorBrewer_1.0-5                    gplots_2.11.0                         MASS_7.3-16                           KernSmooth_2.23-6                    
[17] caTools_1.14                          gdata_2.12.0                          gtools_2.7.1                          compare_0.2-3                        
[21] lattice_0.20-10                       lumi_2.6.0                            nleqslv_2.0                           methylumi_2.0.13                     
[25] Biobase_2.14.0                       

loaded via a namespace (and not attached):
 [1] affy_1.32.1           affyio_1.22.0         annotate_1.32.3       BiocInstaller_1.2.1   bitops_1.0-5          hdrcde_2.15           IRanges_1.12.6        Matrix_1.0-5         
 [9] nlme_3.1-108          preprocessCore_1.16.0 R.methodsS3_1.4.2     tools_2.14.0          xtable_1.7-1          zlibbioc_1.0.1   

Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list