[BioC] Batch effect

W. Evan Johnson wej at bu.edu
Thu Sep 6 14:22:05 CEST 2012


Santana,

The first thing I always do is hierarchical clustering. Often, batch effects are easily spotted with this simple approach. Then try something like PCA.

Also, just to point out, we have recently published a single-sample normalization approach, SCAN, that does a better job at normalizing the arrays. Often, artifacts that look like 'batch effects' drop out in the normalization step with this approach. We've shown in several cases that this approach does a better job at combining data than anything else out there, so it will give you a cleaner starting point. After SCAN normalization, if you still have batch effects, try ComBat or sva (both in the sva package). This will likely be all you need for your batch effects.

Here is a link to our SCAN paper: http://www.sciencedirect.com/science/article/pii/S0888754312001632
Here is a link to our SCAN software: http://jlab.bu.edu/software/scan-upc/

SCAN is available in both R and Python at the site.

Hope this helps!

Evan


On Sep 6, 2012, at 6:00 AM, bioconductor-request at r-project.org wrote:

> Message: 19
> Date: Wed, 5 Sep 2012 20:46:58 -0400
> From: Jeff Leek <jtleek at gmail.com>
> To: Wolfgang Huber <whuber at embl.de>
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Batch effect
> Message-ID:
> 	<CAGWgrqNPDHCFwVH2nhf8ZXnfBk_KDy_DMEER_5NMSDxsjbXTxQ at mail.gmail.com>
> Content-Type: text/plain
> 
> Hi Santana,
> 
> You might also try the sva function in the sva package. This function is
> specifically designed to identify batch effects and other sources of
> variation. PCA typically confounds any signal of interest with potential
> batch effects, so may be somewhat deceiving, particularly if the batches
> are not balanced across groups of interest.
> 
> Best,
> 
> Jeff
> 
> On Wed, Sep 5, 2012 at 5:35 PM, Wolfgang Huber <whuber at embl.de> wrote:
> 
>> Dear Santana
>> 
>> you could try the arrayQualityMetrics function in the eponymous package,
>> which produces PCA plots and other diagnostics and is helpful to detect
>> batch effects.
>> 
>> The function runs either on the AffyBatch object, or the normalised
>> ExpressionSet; the former is more useful to understand how well the
>> experiment worked, the latter, how well subsequent analyses might work.
>> 
>>        Best wishes
>>        Wolfgang
>> 
>> 
>> Sep/5/12 3:10 PM, James W. MacDonald scripsit:
>> 
>> Hi Santana,
>>> 
>>> On 9/5/2012 2:14 AM, Santana Sarma wrote:
>>> 
>>>> Hi,
>>>> 
>>>> 
>>>> How is it possible to judge whether there is any batch effect in two
>>>> groups
>>>> of Affymetrix .cel files ? I have got currently one Affybatch object by
>>>> reading all the .cell files.
>>>> 
>>> 
>>> There are several things you can look at. I find PCA plots very helpful
>>> to look for batch effects. You might also look at density plots (hist()
>>> function in affy) as well as boxplots. But IMO PCA is the most useful.
>>> 
>>> Best,
>>> 
>>> Jim
>>> 
>>> 
>>> 
>>>> 
>>>> Being new to Affymetrix analysis, any advice/elaboration will be very
>>>> helpful.
>>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> Santana
>>>> 
>>>>    [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>> 
>>> 
>>> 
>> 
>> --
>> Best wishes
>>        Wolfgang
>> 
>> Wolfgang Huber
>> EMBL
>> http://www.embl.de/research/**units/genome_biology/huber<http://www.embl.de/research/units/genome_biology/huber>
>> 
>> 
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>> 
> 
> 	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list