[BioC] comparing HG-U219 data to HG-U133 data from public databases

Sole Acha, Xavi x.sole at iconcologia.net
Thu Mar 10 11:02:32 CET 2011


Dear Andreas,

a possible pipeline (though not necessarily the best one -- suggestions welcome) to compare HG-U133 Plus2 and HG-U219 data is:

1) Download CEL files and normalize both datasets separately using RMA, so you don't use Plus2's MM. I believe it is not straightforward to convert Plus2 CEL files to U219 or viceversa.

2) For both array types, keep only probesets with the best match (the most comparable between the two array types). You can find this information in Affymetrix's website:

http://www.affymetrix.com/support/downloads/comparisons/U133PlusVsU219_BestMatch.zip

In this file you have the correspondence between Plus2 and U219 probesets. You can build then a complete matrix with all the Plus2 and U219 hybridizations and only the common probesets, for which you will have to create a new ID, since probeset ID's are different for Plus2 and U219 arrays.

3) Once you have a common set of probesets for both datasets, you can re-normalize all the arrays altogether applying a quantile normalization (see package limma).

Although this approach may work for you, please notice that even after applying quantile normalization in step 3 you may have a strong batch effect in your data, which you must be aware of.

Hope this helps,

Xavi.

------
Xavier Solé Acha
Unitat de Biomarcadors i Susceptibilitat
Unit of Biomarkers and Susceptibility
Institut Català d'Oncologia // Catalan Institute of Oncology
Gran Via de L'Hospitalet 199-203
08908 L'Hospitalet de Llobregat, Barcelona, Spain.
Phone: +34 93 260 71 22 / +34 93 260 71 86 (ext. 7122)
Fax: +34 93 260 71 88
E-mail: x.sole (at) iconcologia.net

-----Mensaje original-----
De: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] En nombre de Andreas Heider
Enviado el: jueves, 10 de marzo de 2011 9:29
Para: bioconductor at r-project.org
Asunto: [BioC] comparing HG-U219 data to HG-U133 data from public databases

Hi bioconductor users,
I am a PhD student working on stem cells from human umbilical cord blood. I
got data from our sorted cells relying on the HG-U219 platform from
Affymetrix. What I want to do is to compare our data to data from public
databases such as GeneExpressionOmnibus or ArrayExpress, but unfortunately
these data are all based on the HG-U133 platform.

So my question is: What would be the simplest approach to achieve this? What
would be the best approach to achieve this?

I'm thinking of 2 scenarios:
First scenario:
1. Get an expression table for both datasets (U219 and U133)
2. label both datasets with "comparable" identifiers, eg UniGene Id or
GeneBank A#
3. get new expression tables with only entries present in both datasets

Second scenario:
1. get raw data of both datasets
2. import CEL files from U219 and convert it to U133 format
3. combine both datasets into 1
4. do normalization of all data together

Please tell me it is possible, and if then how to do it. I'm pretty sure it
is possible, but I'm a R and BioC novice and don't know every
function/package.

Thanks in advance, Andreas

PS: Is it problematic that there are only perfect match probes on the
HG-U219 and no MMs?

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list