[BioC] Merging microarray datasets

Adaikalavan Ramasamy ramasamy at cancer.org.uk
Fri Apr 25 03:24:46 CEST 2008


Kathy, this depends on two thing.

1) how similar are the chip types?

For example, hgu95a and hgu95av2, two Affymetrix chips which differed by 
one probeset. I do not know why it differed by one probeset but I 
suppose one just omitted the extra probeset and preprocess them together.


2) the type of preprocessing algorithm used (and sample size)

If you are using preprocessing algorithms that work on array by array 
basis (e.g. median scaling), then you can normalize the different chip 
types differently followed by a merge(). Creating missing values for the 
probes not present in one type of chip but in others.

Next, you can either try to adjust the expression values for possible 
biases (e.g. see Benito PMID:14693816) or include a chip type indicator 
in your analysis as Gentlemen and others have pointed out.


If you are using algorithms that take information across chips (e.g. 
RMA) AND you only have small number of arrays for each chip type, then 
you need to give more thought. It would be worth exploring if one can 
merging the chips at probe-level (e.g. matchprobes package) to benefit 
from better parameter estimation.


Regards, Adai





Kathy, sorry for bring up the issue of preprocessing. It is relevant and 
I will raise in a separate thread.




Kathy Duncan wrote:
> Thanks to Adai and Eric,
> 
> Well, I'm trying to bring back the discussion to the previous direction as
> it apparently went to a different area : cross-platform integration. :)   I
> was wondering about integration within the same platform – an issue when
> there are multiple chips (in case of affymetrix)     OR        multiple
> print layouts (cDNA .gal files).
> 
> "… have to normalize all raw data together. All data should also be of one
> platform only. Then you can simply normalize all CEL files or all -----
> files together and be done."  [courtesy: Balasubramanian]
> 
> So, if I have suppose MA1, MA2… as respective normalised datasets (same
> platform. After doing normalization based on chip-types in case of
> Affymetrix, OR, print layouts in case of cDNA), can I just normalize them
> again for the final dataset, or I need to take care of some other issues too
> (how to tackle!) ? Also, wonder if there's any smart package in this regard!
> 
> 
> Also Eric, I didn't get you what project you were talking about : "…See for
> example the oncomine project…."
> 
> Thanks.
> 
> Kathy
>



More information about the Bioconductor mailing list