[BioC] Re: normalisation or analysis with batch effects

Darlene Goldstein Darlene.Goldstein at epfl.ch
Wed Dec 1 13:21:13 CET 2004

Hi, I just wanted to mention that even if you do normalize all the chips
together, you are still likely to see the 'batch' (or 'block') effects.  To try
to assess the extent of the problem, you might cluster the samples and see if
you get samples from the same batch clustering together.

Best regards, Darlene



the 11 tumour sampel are considered as biological replicates, or are these split
into different tumour classes?

We've had a similar problem. Our data was generated in three different
laboratories, each having slightly different protocols, but within each lab we
had the same factors (the same doses of a drug).

I guess, if the tumours are considered as replicates one could include the batch
as a factor (as you suggest below), but if they contain different tumour classes
one could not separate the dmso effect from the "tomour" class effect.

The tissues samples (normal and tumour) are probably from different subjects and
will show strong differences per se. Maybe one get some estimates for the impact
of the batch by using a mixed effects model with each sample as random effect
and the batch as fixed effect.

something like lme(response ~ batch, data=d, rand = ~ 1|sample)

I'm not sure about this, it's just an idea ...

Anyway, I'd pre-process (normalize) all samples together, otherwise there'll
certainly be a batch effect.

	kind regards,


> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of
> Adaikalavan
> Ramasamy
> Sent: 30 November 2004 23:51
> To: BioConductor mailing list
> Cc: Andrea Pellagatti
> Subject: [BioC] normalisation or analysis with batch effects
> Dear list,
> If the following question has been asked before, I do apologise in
> advance and hope someone can point to the relevant thread. Otherwise I
> would appreciate some thoughts and pointers to this problem.
> Thank you.
> Problem : My collaborator (cc-ed here) has performed hybridisation for
> 11 tumour and 40 normal samples on Affymetrix HGU-133Av2
> (contains ~55k
> probesets) chips. He had hybridised about half of the samples when he
> realised he needed more Affymetrix chips.
> The second batch of chips arrived with the instruction to add DMSO in
> the hybridisation cocktail, which he followed. The first batch did not
> have such instruction. Therefore we believe that the two
> batches are not
> directly comparable. A posting to GeneArray mailing list had a reply
> (http://bfx.kribb.re.kr/gene-array/1255.html) supporting this view. A
> cross-table of batch and sample is given below :
>                           | normal  tumour   total
>    batch 1 (with DMSO)    |   17       6     23
>    batch 2 (without DMSO) |   23       5     28
>    -----------------------|---------------------
>    total                  |   40      11     51
> Therefore I have considered the following possible solutions :
> 1) Preprocess all arrays and compare tumour vs. normal
> 2) Preprocess the two batches separately and cbind() them.
> Then compare
> tumour vs. normal
> 3) Preprocess all arrays but include a batch effect in analysis ( I am
> not sure how to do this - perhaps using LIMMA)
> 4) Preprocess separately and proceed as 3)
> Here, I use RMA to preprocess the arrays. I have done 1) and
> 2) and the
> correlation of the two gene lists, as assessed by correlation of gene
> ranks, is only 0.35. I think 4) is a bit of overkill.
> Any opinions or alternative suggestions are very welcomed. Thank you.
> Regards,
> --
> Adaikalavan Ramasamy                    ramasamy at cancer.org.uk
> Centre for Statistics in Medicine       http://www.ihs.ox.ac.uk/csm/
> Cancer Research UK                      Tel : 01865 226 677
> Old Road Campus, Headington, Oxford     Fax : 01865 226 962
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

Darlene Goldstein
Institute of Mathematics, EPFL            Tel: +41 21 693 2552
CH-1015 Lausanne                          Fax: +41 21 693 4303

More information about the Bioconductor mailing list