[BioC] Affymetrix Mouse Gene 1.0 ST - Number of probes

Sophie LAMARRE [guest] guest at bioconductor.org
Wed Dec 21 11:38:47 CET 2011


Hello,

I work on affymetrix mouse gene 1.0 ST.

I used two methods in order to match my data base with my probes. I compared the uniques probes in the two methods after doing a RMA normalization:

-> there were 34 760 probes (controls probe and main probes) when I used R/ Bioconductor. I downloaded the Unsupported Mouse Gene 1.0 ST Array CDF (Technical documentation -> Library Files) on Affymetrix website in order to have the cdf files and to make my own CDF package.
-> there were 35 556 probes (controls probe and main probes) when I used Expression Console. I downloaded the Mouse Gene 1.0 ST Array, Analysis (Technical documentation -> Library Files) in order to have the files that Expression Console need.

=> So I lost 796 probes. It's boring!

Next, when I kept only main probes (after matched my data base with the Affymetrix annotation file available on Affymetrix website), I had:
-> 28 104 probes with Bioconductor
-> 28 856 probes with Expression Console

=> There were 752 main probes, I hadn't if I realized my data analysis with Bioconductor. I'm worry because sometimes one can ask me not to do summarization probes, so I can't use Expression Console, I have to use Bioconductor. I lost a lot of probes. 

I asked my question to Affymetrix support and they answered:

This difference can be due to a number of reasons.

Firstly, the CDF file is the array layout information designed for 3' IVT array analysis,  and are therefore not optimal for a WT array (The WT arrays use different library files, CLF and PGF). This is the reason why it is given a unsupported status (as seen in the name). This could explain the difference you see.

Secondly, bioconductor and Expression Console are different software, so the RMA algorithm may not work identically the same. Things like background correction, filtering and such might differ between these two software.

What do you think answer Affymetrix support? Personnally, I don't think that the summarization (median polish) removes somes probes. How you could explain the difference I found? How I can do in so as to I keep all the probes I need (main probes)?

Thank you,

Sophie LAMARRE
Biostatistician - Toulouse (FRANCE)

 -- output of sessionInfo(): 

R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] affy_1.30.0    Biobase_2.12.2

loaded via a namespace (and not attached):
[1] affyio_1.20.0         preprocessCore_1.14.0 tools_2.13.0

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list