[BioC] normalization and batch correction across multiple project

Tue Aug 26 19:31:55 CEST 2014

Hi Adaikalavan,

Why not try it both ways and see if it even makes a difference? If you 
get the same results either way, then just do whatever is easier.

If you do batch correction before removing other projects' samples, I 
would think you would need to include the project identifier as a batch 
effect in addition to the scan date or chip number, right?

-Ryan

On 8/18/14, 5:11 AM, Adaikalavan Ramasamy wrote:
> Dear all,
>
> I would like to appeal to the collective wisdom in this group on how best
> to solve this problem of normalization and batch correction.
>
> We are a service unit for an academic institute and we run several projects
> simultaneously. We use Illumina HT12-v4 microarrays which can take up to 12
> different samples per chip. As we QC the data from one project, the RNA
> from failed samples can be repeated to include into chips from another
> project (rather than running partial chips to avoid wastage). Sometimes we
> include samples from other projects also. Here is a simple illustration
>
> Chip No       ScanDate    Contents
> 1                1st July        *12 samples from project A*
> 2                1st July          *8 samples from project A* + 4 from
> project B
> 3                1st August   12 samples from Project B
> 4                1st August     *1 sample from Project A* + 5 samples from
> B + 6 from project C
> ...
>
> What is the best way to prepare the final data for *project A*? One option
> is to do the following:
>
>     1. Pool chips 1, 2 and 4 together.
>     2. Remove failed samples
>     3. Remove samples from other projects.
>     4. Normalize using NEQC from limma
>     5. Correct for scan date using COMBAT from sva.
>
> The other option we considered is to omit step 3 (i.e. use other samples
> for normalization and COMBAT) and subset at the end.
>
> I feel this second option allows for better estimation of batch effects
> (especially in chip 4). However, sometimes project A and B can be quite
> different (e.g. samples derived from different tissues) which might mess up
> the normalization especially if we want to compare project A to B directly. We
> also considered nec() followed by normalizeBetweenArrays with "Tquantile"
> but I felt it was too complicated. Anything else to try?
>
> Thank you.
>
> --
>
> Adaikalavan Ramasamy
>
> Senior Leadership Fellow in Bioinformatics
>
> Head of the Transcriptomics Core Facility
>
>
>
> Email: adaikalavan.ramasamy at ndm.ox.ac.uk
>
> Office: 01865 287 710
>
> Mob: 07906 308 465
>
> http://www.jenner.ac.uk/transcriptomics-facility
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor