[BioC] normalization and batch correction across multiple project

Wed Aug 27 17:13:03 CEST 2014

Dear Ryan,

Thank you for the advice. I am happy to do it both ways but these are large
projects and also it would be difficult to quantify if the differences are
small enough. Which is why I wanted to get the opinion of others in this
list.

And yes, you are right in that we need to include project and scan date
into the adjustment if I batch correct first. Thanks.

Regards, Adai

On Tue, Aug 26, 2014 at 6:31 PM, Ryan <rct at thompsonclan.org> wrote:

> Hi Adaikalavan,
>
> Why not try it both ways and see if it even makes a difference? If you get
> the same results either way, then just do whatever is easier.
>
> If you do batch correction before removing other projects' samples, I
> would think you would need to include the project identifier as a batch
> effect in addition to the scan date or chip number, right?
>
> -Ryan
>
>
> On 8/18/14, 5:11 AM, Adaikalavan Ramasamy wrote:
>
>> Dear all,
>>
>> I would like to appeal to the collective wisdom in this group on how best
>> to solve this problem of normalization and batch correction.
>>
>> We are a service unit for an academic institute and we run several
>> projects
>> simultaneously. We use Illumina HT12-v4 microarrays which can take up to
>> 12
>> different samples per chip. As we QC the data from one project, the RNA
>> from failed samples can be repeated to include into chips from another
>> project (rather than running partial chips to avoid wastage). Sometimes we
>> include samples from other projects also. Here is a simple illustration
>>
>> Chip No       ScanDate    Contents
>> 1                1st July        *12 samples from project A*
>> 2                1st July          *8 samples from project A* + 4 from
>>
>> project B
>> 3                1st August   12 samples from Project B
>> 4                1st August     *1 sample from Project A* + 5 samples from
>>
>> B + 6 from project C
>> ...
>>
>> What is the best way to prepare the final data for *project A*? One option
>>
>> is to do the following:
>>
>>     1. Pool chips 1, 2 and 4 together.
>>     2. Remove failed samples
>>     3. Remove samples from other projects.
>>     4. Normalize using NEQC from limma
>>     5. Correct for scan date using COMBAT from sva.
>>
>>
>> The other option we considered is to omit step 3 (i.e. use other samples
>> for normalization and COMBAT) and subset at the end.
>>
>> I feel this second option allows for better estimation of batch effects
>> (especially in chip 4). However, sometimes project A and B can be quite
>> different (e.g. samples derived from different tissues) which might mess
>> up
>> the normalization especially if we want to compare project A to B
>> directly. We
>> also considered nec() followed by normalizeBetweenArrays with "Tquantile"
>> but I felt it was too complicated. Anything else to try?
>>
>> Thank you.
>>
>> --
>>
>> Adaikalavan Ramasamy
>>
>> Senior Leadership Fellow in Bioinformatics
>>
>> Head of the Transcriptomics Core Facility
>>
>>
>>
>> Email: adaikalavan.ramasamy at ndm.ox.ac.uk
>>
>> Office: 01865 287 710
>>
>> Mob: 07906 308 465
>>
>> http://www.jenner.ac.uk/transcriptomics-facility
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.
>> science.biology.informatics.conductor
>>
>
>

	[[alternative HTML version deleted]]