[BioC] Microarray data normalization

Bernard Lee Kok Bang bernard.lee at carif.com.my
Wed Jul 30 02:32:08 CEST 2014


Dear all, I would like to ask a question in regards to microarray data normalization. 

Scenario;
I have in hand a collection of 300 cancer cell lines (multiple cancer types) raw ‘.CEL’ files, all from the same study/batch. My aim is to obtain the gene expression values and use them downstream. However I am only interested in a subset of these .CEL files, for example I am only interested in NON-blood cancer cell lines (n=250). 

I’m wondering which of these two options is more appropriate for my scenario:

Option 1:
1)	Normalize all 300 .CEL by rma.
2)	After normalization, manually remove the 50 blood samples I am NOT interested in
3)	Use the normalized data of 250 samples for downstream analysis

Option 2:
1)	Normalize ONLY the 250 .CEL by rma (imagine as if the 50 blood samples does not exists)
2)	Use the normalized data of 250 samples for downstream analysis

My downstream analysis simply involves ranking the gene from highest expression to the lowest. 

>From my point of view, I am favoring the first option. This is because since I have all the solid tumor and blood cell line data, I might as well normalized them altogether first before manually excluding the blood cell line, as to my knowledge the purpose of normalization is to remove batch effects?? So the larger the sample size during rma normalization the better??


Thanks in advance.

Bernard Lee
Research Assistant
Cancer Research Initiatives Foundation (CARIF)
University of Malaya (UM)


More information about the Bioconductor mailing list