[BioC] Re: normalisation or analysis with batch effects

James W. MacDonald jmacdon at med.umich.edu
Wed Dec 1 17:42:07 CET 2004

>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch
>[mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of
>Sent: 30 November 2004 23:51
>To: BioConductor mailing list
>Cc: Andrea Pellagatti
>Subject: [BioC] normalisation or analysis with batch effects
>Dear list,
>If the following question has been asked before, I do apologise in
>advance and hope someone can point to the relevant thread. Otherwise I
>would appreciate some thoughts and pointers to this problem.
>Thank you.
>Problem : My collaborator (cc-ed here) has performed hybridisation for
>11 tumour and 40 normal samples on Affymetrix HGU-133Av2
>(contains ~55k
>probesets) chips. He had hybridised about half of the samples when he
>realised he needed more Affymetrix chips.
>The second batch of chips arrived with the instruction to add DMSO in
>the hybridisation cocktail, which he followed. The first batch did not
>have such instruction. Therefore we believe that the two
>batches are not

There is a much larger difference between these protocols than simply 
adding DMSO. If I am not mistaken, the difference here is that the old 
samples were processed using the Enzo IVT kit, and the new samples were 
processed using the Affy IVT kit. We have found that these data cannot 
be processed together using e.g., RMA because a large portion of the 
probesets have completely different patterns. In addition, the 
distribution of PM probes is quite different for the two protocols, so I 
don't think a quantile normalization is appropriate. You can check this 
by fitting the RMA model using rmaPLM() in the affyPLM package, and then 
checking the residual plots.

We have shied away from combining chips that were processed using the 
two IVT kits, but if you have to do so, I would recommend processing 
each group separately and then fitting a model with a batch effect.



>directly comparable. A posting to GeneArray mailing list had a reply
>(http://bfx.kribb.re.kr/gene-array/1255.html) supporting this view. A
>cross-table of batch and sample is given below :
>                          | normal  tumour   total
>   batch 1 (with DMSO)    |   17       6     23
>   batch 2 (without DMSO) |   23       5     28
>   -----------------------|---------------------
>   total                  |   40      11     51
>Therefore I have considered the following possible solutions :
>1) Preprocess all arrays and compare tumour vs. normal
>2) Preprocess the two batches separately and cbind() them.
>Then compare
>tumour vs. normal
>3) Preprocess all arrays but include a batch effect in analysis ( I am
>not sure how to do this - perhaps using LIMMA)
>4) Preprocess separately and proceed as 3)
>Here, I use RMA to preprocess the arrays. I have done 1) and
>2) and the
>correlation of the two gene lists, as assessed by correlation of gene
>ranks, is only 0.35. I think 4) is a bit of overkill.
>Any opinions or alternative suggestions are very welcomed. Thank you.
>Adaikalavan Ramasamy                    ramasamy at cancer.org.uk
>Centre for Statistics in Medicine       http://www.ihs.ox.ac.uk/csm/
>Cancer Research UK                      Tel : 01865 226 677
>Old Road Campus, Headington, Oxford     Fax : 01865 226 962
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch

James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

More information about the Bioconductor mailing list