[BioC] normalization and batch correction across multiple project

Adaikalavan Ramasamy adaikalavan.ramasamy at gmail.com
Mon Aug 18 14:11:02 CEST 2014


Dear all,

I would like to appeal to the collective wisdom in this group on how best
to solve this problem of normalization and batch correction.

We are a service unit for an academic institute and we run several projects
simultaneously. We use Illumina HT12-v4 microarrays which can take up to 12
different samples per chip. As we QC the data from one project, the RNA
from failed samples can be repeated to include into chips from another
project (rather than running partial chips to avoid wastage). Sometimes we
include samples from other projects also. Here is a simple illustration

Chip No       ScanDate    Contents
1                1st July        *12 samples from project A*
2                1st July          *8 samples from project A* + 4 from
project B
3                1st August   12 samples from Project B
4                1st August     *1 sample from Project A* + 5 samples from
B + 6 from project C
...

What is the best way to prepare the final data for *project A*? One option
is to do the following:

   1. Pool chips 1, 2 and 4 together.
   2. Remove failed samples
   3. Remove samples from other projects.
   4. Normalize using NEQC from limma
   5. Correct for scan date using COMBAT from sva.

The other option we considered is to omit step 3 (i.e. use other samples
for normalization and COMBAT) and subset at the end.

I feel this second option allows for better estimation of batch effects
(especially in chip 4). However, sometimes project A and B can be quite
different (e.g. samples derived from different tissues) which might mess up
the normalization especially if we want to compare project A to B directly. We
also considered nec() followed by normalizeBetweenArrays with "Tquantile"
but I felt it was too complicated. Anything else to try?

Thank you.

--

Adaikalavan Ramasamy

Senior Leadership Fellow in Bioinformatics

Head of the Transcriptomics Core Facility



Email: adaikalavan.ramasamy at ndm.ox.ac.uk

Office: 01865 287 710

Mob: 07906 308 465

http://www.jenner.ac.uk/transcriptomics-facility

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list