[BioC] comparing different batches of data directly

Kamila Naxerova knaxerov at ix.urz.uni-heidelberg.de
Fri Dec 8 17:04:03 CET 2006

I am struggling with a similar question. I would like to include cancer 
profiles from different studies in a principal components analysis. Jim, 
what would you suggest in this case, when I am not interested in 
differential gene expression but in a global comparison?


 > > What would be the most appropriate approach if I want to compare gene
 > > expression data from different laboratories (and different biological
 > > sources) directly? Assuming the data were profiled on the same chip,
 > > of course. What kind of normalization (in batches? all together?) and
 > > subsequent processing would be "least harmful"?
 > This depends on what you mean by comparing things 'directly'. If you
 > mean that you have some controls from lab 1 and some experimentals from
 > lab 2 that you want to compare, then it doesn't really matter what you
 > do because you won't be able to control for the 'lab' effect. In other
 > words, you won't ever be able to determine if a given change is due to
 > Biological differences or simply technical variability due to being run
 > in different labs.
 > On the other hand, if you have microarray data for both sample types
 > that were run in two different labs (i.e., control and experimental
 > samples from lab 1 and control and experimental samples from lab 2),
 > then you would want to normalize the data from each lab in separate
 > batches and then compare using a mixed model. The GeneMeta package in
 > the devel repository is designed to do this sort of thing.
 > Alternatively, you could use something like lme() in the nlme package on
 > a row-wise basis (this would be slow however).
