[BioC] split arrays

scholz@Ag.arizona.edu scholz at Ag.arizona.edu
Thu Sep 29 10:37:29 CEST 2005

Thanks, Robert. If I am understanding you correctly, you would advocate both
separate normalization AND separate linear modeling in the case where the two
arrays come from different batches and have no common probeset, correct? If I
was reading Gordon's reply to the other gentleman's email correctly, he was
suggesting separate normalization but not separate linear modeling for the
datasets. My question, which in retrospect was unclear, was about what the
advantages/disadvantages were to combining/separating the datasets for linear


> Hi,
>   I am not sure what you are really asking but here goes.
> References and corresponding R/Bioconductor packages are listed below.
>    In my opinion separate normalization and expression estimation is 
> essential for different experiments (and by experiment I mean a 
> collection of identical arrays processed at about the same time by about 
> the same people using about the same protocol; and by identical arrays I 
> mean from the same batch). While one can often do fancy things to align 
> different arrays prior to processing them it does not seem like a good 
> idea at all. When it works, so would separate normalization and when it 
> does not work you won't know.
>   After you have normalized and estimated expression values then you 
> have the gene matching problem. This is not tivial, there are papers 
> around that discuss this (Parmigiani et al). There are some issues 
> regarding whether you want to make inference at the gene level or the 
> sequence level (Unigene is not the same as Entrez Gene). While many have 
> ignored the issues that arise (even on a single chip) where the same 
> gene has been probed via several different methods, that does not seem 
> to be a "best practices".
>   If you have no common genes, then life is somewhat easier, you just 
> have a bunch more features, and the suggestion to simply use rbind seems 
> pretty sensible to me, although there are some potential pitfalls and 
> you might want to do some checking to ensure that one set of features is 
> not dominating the other for reasons that are not biological.
>   If you do have genes in common, then life is harder, the models are 
> more complicated and IMHO you want to spend a few hours with a local 
> statistician sorting out what questions you want to ask. Essentially, 
> considering what the right model is, on a per gene basis is a pretty 
> good starting point. As I said there are some papers (Choi et al, 
> Gentleman et al), sometimes they come under the heading of 
> meta-analysis, and other times simply random effects models. For the 
> more statistically inclined I recommend the book by Solomon and Cox 
> which directly addresses issues regarding combining microarray experiments.
>   Best wishes,
>     Robert
> G. Parmigiani, E. Garrett-Mayer, R. Anbazhagan, et al. A cross-study 
> comparison of gene
> expression studies for the molecular classification of lung cancer. 
> Clincal Cancer Research,
> 10:2922–2927, 2004.
> R package: MergeMaid
> J. K. Choi, U. Yu, S. Kim, et al. Combining multiple microarray studies 
> and modeling
> interstudy variation. Bioinformatics, 19, Suppl. 1:i84–i90, 2003.
> R package: GeneMeta
> D.R. Cox and P. J. Solomon. Components of Variance. Chapman and Hall, 
> New York, 2003.
> On the Synthesis of Microarray Experiments
> R. Gentleman, M. Ruschhaupt and W. Huber,
> R package: GeneMetaEx
> scholz at Ag.arizona.edu wrote:
> > Adrien,
> > 
> > Thanks for this response. Unfortunately, there are no oligos in common between
> > the two arrays. If anyone else has a response to my question (below), I'd like
> > to hear it.
> > 
> > Matt
> > 
> > 
> > Matt,
> > 
> > I am not familiar with the maize arrays, but I am using the following
> > procedure for Affymetrix moe430 split arrays, which have ~160 probesets
> > in common between A and B:
> > 1) background-correct each chip separately at probe-level
> > 2) get a measure of expression at probeset-level
> > 3) plot the common probesets against each other for each pair of each
> > chips. If you observe the same thing as me, you will see that the trend
> > is linear but with intercept != 0 and slope != 1. 
> > 4) scale the B chip with those estimated intercept and slope
> > 
> > Steps 1 and 2 are easily done with rma( , normalize=F).
> > Wolfgang Huber and I are currently writing a little package which does
> > steps 3 and 4 automatically.
> > 
> > I'm not sure whether this procedure could make sense or be adapted
> > somehow to your maize arrays (do they have enough probes in common?),
> > but anyway, some food for thoughts...
> > 
> > Adrien
> > 
> > 
> >>Gordon,
> >>
> >>Recently you advised someone with a split set of maize arrays 
> >>that they could do their analysis by reading all the A slides 
> >>into an RGList and normalizing, then doing the same with the 
> >>B slides, and then combining the two datasets via
> >>rbind() of the two MAList objects. I have a similar (the 
> >>same?) set of arrays and some of the users of these arrays 
> >>have noted that the A and B slides perform differently, i.e. 
> >>more background on the B slide, for whatever reason. Though 
> >>I'm not actually convinced this is true, it makes me wonder 
> >>whether the two datasets should be combined at all since 
> >>there may be a "between array set"
> >>source of variation. Am I right to segregate these sets or is 
> >>there some overwhelming benefit to combining them? I'm no 
> >>statistician and would appreciate your take.
> >>
> >>Thanks,
> >>
> >>
> > 
> > Matt
> > 
> > ---------------------------------------------
> > College of Agriculture and Life Sciences Web Mail.
> > http://ag.arizona.edu
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > 
> -- 
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org

College of Agriculture and Life Sciences Web Mail.

More information about the Bioconductor mailing list