[BioC] Combat: Removing multiple batch effects

Peter Langfelder peter.langfelder at gmail.com
Sun Aug 3 19:59:31 CEST 2014


I believe limma has functionality for removing batch effects, if you
prefer that. For ComBat, you would simply use arguments batch = <your
sample annotation>$Batch and mod = model.matrix(~1, data = <your
sample annotation>).

Replace <your sample annotation> by the name of the variable that
holds the sample annotation data frame you have shown in your message.

Since your condition is exactly independent of batch and you have so
few samples, I would not include it as a covariate.

I am not sure how well ComBat would work with just 2 samples in a
batch, since it attempts to equalize not just location (think mean)
but also scale (standard deviation) of each variable, for which 2
samples is the barest theoretical minimum.

HTH,

Peter

On Fri, Aug 1, 2014 at 8:00 AM, Shaun Webb [guest]
<guest at bioconductor.org> wrote:
>
> Hi, I am attempting to remove two known batch effects from my Illumina BeadChip expression data. I have the following experimental design:
>
> ID Condition Batch Parents
> 1 wt 1 1
> 2 ko 1 1
> 3 wt 1 2
> 4 ko 1 2
> 5 wt 2 3
> 6 ko 2 3
>
> These are 3 pairs of sibling mice each with different parents. Within each pair of siblings I have a wt and ko. On top of this, samples were prepared in 2 separate batches (batch1 = first 4 samples).
>
> After background correction and normalisation in limma my MDS plots shows that siblings group together and the first 4 samples are split from the final 2.
>
> I would like to adjust my expression data for both of these effects to test for differential expression between wt and ko groups. ComBat seems to be a good choice however I am struggling to figure out the best approach for multiple batches. I'd appreciate if anyone could give me some advice on how to set up ComBat in this context.
>
> Thanks in advance for your help
> Shaun Webb
> University of Edinburgh
>
>  -- output of sessionInfo():
>
> R version 3.1.1 (2014-07-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C
>  [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C
>  [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
> [10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C
>
> attached base packages:
>  [1] splines   parallel  grid      stats     graphics  grDevices
>  [7] utils     datasets  methods   base
>
> other attached packages:
>  [1] pamr_1.54.1
>  [2] survival_2.37-7
>  [3] cluster_1.15.2
>  [4] bladderbatch_1.2.0
>  [5] statmod_1.4.20
>  [6] sva_3.10.0
>  [7] mgcv_1.7-29
>  [8] nlme_3.1-117
>  [9] corpcor_1.6.6
> [10] illuminaMousev2.db_1.22.1
> [11] org.Mm.eg.db_2.14.0
> [12] DESeq2_1.4.5
> [13] RcppArmadillo_0.4.320.0
> [14] Rcpp_0.11.2
> [15] DiffBind_1.10.1
> [16] GenomicAlignments_1.0.1
> [17] BSgenome_1.32.0
> [18] Rsamtools_1.16.0
> [19] Biostrings_2.32.0
> [20] limma_3.20.4
> [21] XVector_0.4.0
> [22] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0
> [23] GenomicFeatures_1.16.2
> [24] lumi_2.16.0
> [25] methyAnalysis_1.6.0
> [26] org.Hs.eg.db_2.14.0
> [27] RSQLite_0.11.4
> [28] DBI_0.2-7
> [29] AnnotationDbi_1.26.0
> [30] Biobase_2.24.0
> [31] GenomicRanges_1.16.3
> [32] GenomeInfoDb_1.0.2
> [33] IRanges_1.22.6
> [34] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
>  [1] AnnotationForge_1.6.1    BBmisc_1.6
>  [3] BatchJobs_1.2            BiocInstaller_1.14.2
>  [5] BiocParallel_0.6.1       Formula_1.1-1
>  [7] Gviz_1.8.3               Hmisc_3.14-4
>  [9] KernSmooth_2.23-12       MASS_7.3-33
> [11] Matrix_1.1-4             R.methodsS3_1.6.1
> [13] RColorBrewer_1.0-5       RCurl_1.95-4.1
> [15] VariantAnnotation_1.10.1 XML_3.98-1.1
> [17] affy_1.42.2              affyio_1.32.0
> [19] amap_0.8-12              annotate_1.42.0
> [21] base64_1.1               beanplot_1.1
> [23] biomaRt_2.20.0           biovizBase_1.12.1
> [25] bitops_1.0-6             brew_1.0-6
> [27] bumphunter_1.4.2         caTools_1.17
> [29] codetools_0.2-8          colorspace_1.2-4
> [31] dichromat_2.0-0          digest_0.6.4
> [33] doRNG_1.6                edgeR_3.6.2
> [35] fail_1.2                 foreach_1.4.2
> [37] gdata_2.13.3             genefilter_1.46.1
> [39] geneplotter_1.42.0       genoset_1.16.2
> [41] gplots_2.13.0            gtools_3.4.1
> [43] illuminaio_0.6.0         iterators_1.0.7
> [45] lattice_0.20-29          latticeExtra_0.6-26
> [47] locfit_1.5-9.1           matrixStats_0.10.0
> [49] mclust_4.3               methylumi_2.10.0
> [51] minfi_1.10.2             multtest_2.20.0
> [53] munsell_0.4.2            nleqslv_2.2
> [55] nor1mix_1.1-4            pkgmaker_0.22
> [57] plyr_1.8.1               preprocessCore_1.26.1
> [59] registry_0.2             reshape_0.8.5
> [61] rngtools_1.2.4           rtracklayer_1.24.1
> [63] scales_0.2.4             sendmailR_1.1-2
> [65] siggenes_1.38.0          stats4_3.1.1
> [67] stringr_0.6.2            tools_3.1.1
> [69] xtable_1.7-3             zlibbioc_1.10.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list