[BioC] remove microarray batch effects using Limma

Xie, Zhi (NIH/NHLBI) [E] zhi.xie at nih.gov
Tue Oct 26 19:26:45 CEST 2010

Thanks Moshe and Mike!

I tried Moshe's suggestion and the results from the two approaches
were very consistent. I also tested the data set using Combat, the
batch effects were also removed as expected.

I appreciate your prompt replies.


On Tue, Oct 26, 2010 at 4:14 AM, Mike Walter <michael_walter at email.de> wrote:
> Hi Moshe, Hi Zhi,
> I had some data with strong batch effects a while ago. I used several methods (e.g. adding the batch as factor in the modeling or Combat). In the end I decided to use Combat, which works quite nice. However, I wasn't aware of the removeBatchEffect() function so I just got curious. Here is a statement directly from the help page of this function: "This function is intended for use with clustering or PCA, not for use prior to linear modelling. If linear modelling is intended, it is better to include the batch effect as part of the linear model." So I guess the first approach might be the one to pick.
> Cheers, Mike
> -----Ursprüngliche Nachricht-----
> Von: "Moshe Olshansky" <m_olshansky at yahoo.com>
> Gesendet: 26.10.2010 05:49:43
> An: bioconductor at stat.math.ethz.ch, " Zhi (NIH/NHLBI) [E]Xie" <zhi.xie at nih.gov>
> Betreff: Re: [BioC] remove microarray batch effects using Limma
>>Hi Zhi,
>>Check whether replacing your Approach 2 by:
>>where eset.rm.batch is the expression dataset containing
>>expression values from exp.eset.rm.batch table
>>produces more consistent results (i.e. call removeBatchEffect with a third argument which is your design - normal versus disease).
>>--- On Tue, 26/10/10, Xie, Zhi (NIH/NHLBI) [E] <zhi.xie at nih.gov> wrote:
>>> From: Xie, Zhi (NIH/NHLBI) [E] <zhi.xie at nih.gov>
>>> Subject: [BioC] remove microarray batch effects using Limma
>>> To: bioconductor at stat.math.ethz.ch
>>> Received: Tuesday, 26 October, 2010, 5:36 AM
>>> Hi everyone,
>>> I have some microarray data files containing two sets of
>>> samples in
>>> normal and disease condition. I have tested that the data
>>> also contain
>>> significant batch effects with hybridization time. However,
>>> the
>>> positive hits I obtained using the following approaches are
>>> very
>>> different (using the same cutoff value in decideTests
>>> function). I
>>> think I am supposed to use the first approach but I am
>>> surprised to
>>> see a big difference between the two approaches. could
>>> anyone help
>>> figure out the reasons?
>>> Thanks,
>>> Zhi Xie
>>> Here eset is the expression dataset after RMA function.
>>> ___________________
>>> Approach 1:
>>> # Consider batch effects in the model matrix
>>> design<-model.matrix(~0+condition.factor+batch.factor)
>>> # fit the linear model
>>> fit<-lmFit(eset,design)
>>> Then I create contrast matrix and compute coefficients and
>>> errors
>>> using contrast.fit function
>>> ___________________
>>> Approach 2:
>>> # remove batch effects first
>>> exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor)
>>> # only consider normal and disease conditions in the model
>>> matrix
>>> design<-model.matrix(~0+condition.factor)
>>> # fit the linear model
>>> fit<-lmFit(eset.rm.batch,design)
>>> where eset.rm.batch is the expression dataset containing
>>> expression
>>> values from exp.eset.rm.batch table
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list