[BioC] Does combineAffyBatch of package matchprobes eliminate the wrong probe sets?

Christian.Stratowa@vie.boehringer-ingelheim.com Christian.Stratowa at vie.boehringer-ingelheim.com
Fri Mar 10 09:16:43 CET 2006


Sorowly, until now I did not receive an answer to my question, so here is
some more information: 

The main problem is that combineAffyBatch() adds probe set "412_s_at" to the
list of common 
probe sets on U95A and U95Av2 although this probe set does NOT exist on
U95Av2 but was 
replaced by probe set "160033_s_at". Analogously, other probe sets were
replaced on U95Av2 
too, but are not  listed as common probe sets, e.g. "1829_at" is replaced by
"160020_s_at". 
For this reason I was not able to subset the mas5 expression values for U95A
and U95Av2 
to the common probe sets. 

So why is probe set "412_s_at" added to the list of common probe sets but
e.g. "1829_at" not, 
although both probe sets are replaced by oligos with the same
(x,y)-coordinates but with 
different oligonucleotide sequences? 

Here is the addition to the code below to show these issues: 

# setdiff as vectors 
>a21 <- unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(x95A.r2)[[1]])) 
>a12 <- unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(x95Av2.r2)[[1]])) 

>a2 <- unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(dat.r2)[[1]])) 
>a1 <- unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(dat.r2)[[1]])) 

>setdiff(a1,a2) 
 [1] "119_at"    "1215_at"   "1216_at"   "124_i_at"  "125_r_at"  "127_at"   
 [7] "1301_s_at" "1302_s_at" "132_at"    "1429_at"   "1502_s_at" "1829_at"  
[13] "1864_at"   "1889_s_at" "1982_s_at" "36969_at"  "383_at"    "397_at"   
[19] "426_at"    "439_at"    "787_at"    "788_s_at"  "972_s_at"  "985_s_at" 
[25] "997_at" 

>setdiff(a2,a1) 
 [1] "160020_at"   "160021_r_at" "160022_at"   "160023_at"   "160024_at"  
 [6] "160025_at"   "160026_at"   "160027_s_at" "160028_s_at" "160029_at"  
[11] "160030_at"   "160031_at"   "160032_at"   "160033_s_at" "160034_s_at" 
[16] "160035_at"   "160036_at"   "160037_at"   "160038_s_at" "160039_at"  
[21] "160040_at"   "160041_at"   "160042_s_at" "160043_at"   "160044_g_at" 

# Problem: probeset 412_s_at does NOT exist on U95Av2 and thus should not be
listed on combined probe set! 
>setdiff(a21, setdiff(a2,a1)) 
character(0) 
>setdiff(a12, setdiff(a1,a2)) 
[1] "412_s_at" 

Here are the informations from the  respective probe.tab files: 
#HG-U95A_probe.tab 
412_s_at        286     579     1795    AGCCACGGGCGTCAGAGAGACCCGG
Antisense 
412_s_at        494     633     1798    CACGGGCGTCAGAGAGACCCGGGAA
Antisense 
412_s_at        446     145     1807    CAGAGAGACCCGGGAAGGAAGGCTC
Antisense 
412_s_at        77      573     1810    AGAGACCCGGGAAGGAAGGCTCTCG
Antisense 
412_s_at        202     259     1841    GGAGCCAGGACACCTGCTCTCCGGC
Antisense 
412_s_at        141     403     1842    GAGCCAGGACACCTGCTCTCCGGCG
Antisense 
412_s_at        302     119     1846    CAGGACACCTGCTCTCCGGCGCAGA
Antisense 
412_s_at        354     285     1854    CTGCTCTCCGGCGCAGACAGCGGGG
Antisense 
412_s_at        133     489     1855    TGCTCTCCGGCGCAGACAGCGGGGC
Antisense 
412_s_at        134     489     1856    GCTCTCCGGCGCAGACAGCGGGGCC
Antisense 
412_s_at        206     383     1858    TCTCCGGCGCAGACAGCGGGGCCCA
Antisense 
412_s_at        206     385     1859    CTCCGGCGCAGACAGCGGGGCCCAG
Antisense 
412_s_at        192     237     1864    GCGCAGACAGCGGGGCCCAGCGCTC
Antisense 
412_s_at        191     237     1865    CGCAGACAGCGGGGCCCAGCGCTCT
Antisense 
412_s_at        227     191     1868    AGACAGCGGGGCCCAGCGCTCTCCT
Antisense 
412_s_at        364     57      1870    ACAGCGGGGCCCAGCGCTCTCCTGG
Antisense 

#HG-U95Av2_probe.tab 
160033_s_at     1       364     57      1504    GTGCTCCAGGAAGATATAGACATTG
Antisense 
160033_s_at     2       302     119     1533    GGTACAGTCAGAAGGACAGGACAAT
Antisense 
160033_s_at     3       446     145     1534    GTACAGTCAGAAGGACAGGACAATG
Antisense 
160033_s_at     4       227     191     1568    ATTCTGGGGACACAGAGGATGAGCT
Antisense 
160033_s_at     5       191     237     1648    GGGGAAGACCCGTATGCAGGCTCCA
Antisense 
160033_s_at     6       192     237     1700    ACCAGGAGCCTCCTGATCTGCCAGT
Antisense 
160033_s_at     7       202     259     1718    TGCCAGTCCCTGAGCTCCCAGATTT
Antisense 
160033_s_at     8       354     285     1752    CAAGCACTTCTTTCTTTACGGGGAG
Antisense 
160033_s_at     9       206     383     1787    ACGAGCGGCGGAAACTCATCCGATA
Antisense 
160033_s_at     10      206     385     1797    GAAACTCATCCGATACGTCACAGCC
Antisense 
160033_s_at     11      141     403     1901    AGGAGGCCCTGATGGACAACCCCTC
Antisense 
160033_s_at     12      133     489     1926    CCTGGCATTCGTTCGTCCCCGATGG
Antisense 
160033_s_at     13      134     489     1952    TCTACAGTTGCAATGAGAAGCAGAA
Antisense 
160033_s_at     14      77      573     1969    AAGCAGAAGTTACTTCCTCACCAGC
Antisense 
160033_s_at     15      286     579     1980    ACTTCCTCACCAGCTCTATGGGGTG
Antisense 
160033_s_at     16      494     633     2006    TGCCGCAAGCCTGAAGTATGTGCTA
Antisense 

P.S.: 
Looking at the Affymetrix "xx_annot.cvs" and "xx_probe.tab" files I realized
that the oligo sequence is 
missing in the probe.tab file for all probe-sets which are annotated as
"Sequence Source" = TIGR! 
It is not clear while the TIGR sequences are not listed in the probe.tab
files. However, this explains 
why function combineAffyBatch() deletes 197 probe sets and not only the 51
different probe sets. 

Best regards 
Christian 

============================================== 
Christian Stratowa, PhD 
Boehringer Ingelheim Austria 
Dept NCE Lead Discovery - Bioinformatics 
Dr. Boehringergasse 5-11 
A-1121 Vienna, Austria 
Tel.: ++43-1-80105-2470 
Fax: ++43-1-80105-2782 
email: christian.stratowa at vie.boehringer-ingelheim.com 

Maybe I am doing something wrong but when combining U95A and U95Av2 files
using function combineAffyBatch() 
it seems that 197 probe sets from U95A and from U95Av2 are deleted although
there are only 51 different probe sets. 
Furthermore, it seems that probe sets which are available on both U95A and
U95Av2 are deleted and not the different 
probe sets. 

Here is my code, which supports my statements: 
# call libraries 
>library(matchprobes) 
>library(affy) 
>library(hgu95acdf) 
>library(hgu95aprobe) 
>library(hgu95av2cdf) 
>library(hgu95av2probe) 

# load CEL files from folders HG-U95A and HG-U95Av2 
>x95A <- ReadAffy(celfile.path="HG-U95A") 
>x95Av2 <- ReadAffy(celfile.path="HG-U95Av2") 

# combine data 
>res <- combineAffyBatch(list(x95A,x95Av2),
c("hgu95aprobe","hgu95av2probe"), newcdf="comb95") 
>comb95 <- res$cdf 

# rma normalization for combined data 
>dat.rma <- rma(res$dat) 

# rma normalization for U95A and U95Av2 separately 
>x95A.rma <- rma(x95A) 
>x95Av2.rma <- rma(x95Av2) 

# store log2 of expression data as matrix, sort for AffyIDs, and export 
>dat.r2 <- exprs(dat.rma) 
>d <- dimnames(dat.r2)[[1]] 
>dat.r2 <- dat.r2[order(d),] 
>write.table(dat.r2,file="dat_rma.txt",sep="\t",quote=F,col.names=NA) 

>x95A.r2 <- exprs(x95A.rma) 
>d <- dimnames(x95A.r2)[[1]] 
>x95A.r2 <- x95A.r2[order(d),] 
>write.table(x95A.r2,file="x95A_rma.txt",sep="\t",quote=F,col.names=NA) 

>x95Av2.r2 <- exprs(x95Av2.rma) 
>d <- dimnames(x95Av2.r2)[[1]] 
>x95Av2.r2 <- x95Av2.r2[order(d),] 
>write.table(x95Av2.r2,file="x95Av2_rma.txt",sep="\t",quote=F,col.names=NA) 

The following 25 probe sets are on U95Av2 but NOT on U95A: 
>unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(x95A.r2)[[1]])) 
 [1] "160020_at"   "160021_r_at" "160022_at"   "160023_at"   "160024_at"  
 [6] "160025_at"   "160026_at"   "160027_s_at" "160028_s_at" "160029_at"  
[11] "160030_at"   "160031_at"   "160032_at"   "160033_s_at" "160034_s_at" 
[16] "160035_at"   "160036_at"   "160037_at"   "160038_s_at" "160039_at"  
[21] "160040_at"   "160041_at"   "160042_s_at" "160043_at"   "160044_g_at" 

The following 26 probe sets are on U95A but NOT on U95Av2: 
>unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(x95Av2.r2)[[1]])) 
 [1] "119_at"    "1215_at"   "1216_at"   "124_i_at"  "125_r_at"  "127_at"   
 [7] "1301_s_at" "1302_s_at" "132_at"    "1429_at"   "1502_s_at" "1829_at"  
[13] "1864_at"   "1889_s_at" "1982_s_at" "36969_at"  "383_at"    "397_at"   
[19] "412_s_at"  "426_at"    "439_at"    "787_at"    "788_s_at"  "972_s_at" 
[25] "985_s_at"  "997_at" 

This is corrrect, since I checked this also manually! 

Thus alltogether 51 probe sets which are not on both chips, should be
eliminated by function combineAffyBatch(). 
Is this correct? 

However, these are the differences between the combined table and U95Av2: 
>unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(dat.r2)[[1]])) 
  [1] "1142_at"     "1143_s_at"   "1144_at"     "1145_g_at"   "1146_at"    
  [6] "1147_at"     "1148_s_at"   "1149_at"     "1150_at"     "1151_at"    
 [11] "1162_g_at"   "1163_at"     "1164_at"     "1170_at"     "1171_s_at"  
 [16] "1172_at"     "1173_g_at"   "1174_at"     "1175_s_at"   "1176_at"    
 [21] "1177_at"     "1178_at"     "1179_at"     "1180_g_at"   "1181_at"    
 [26] "1278_at"     "1279_s_at"   "1280_i_at"   "1281_f_at"   "1282_s_at"  
 [31] "1283_at"     "1284_at"     "1285_at"     "1286_s_at"   "1428_at"    
 [36] "1513_at"     "1514_g_at"   "1515_at"     "1516_g_at"   "160020_at"  
 [41] "160021_r_at" "160022_at"   "160023_at"   "160024_at"   "160025_at"  
 [46] "160026_at"   "160027_s_at" "160028_s_at" "160029_at"   "160030_at"  
 [51] "160031_at"   "160032_at"   "160033_s_at" "160034_s_at" "160035_at"  
 [56] "160036_at"   "160037_at"   "160038_s_at" "160039_at"   "160040_at"  
 [61] "160041_at"   "160042_s_at" "160043_at"   "160044_g_at" "1608_at"    
 [66] "1609_g_at"   "1623_s_at"   "1624_at"     "1625_at"     "1626_at"    
 [71] "1627_at"     "1628_at"     "1629_s_at"   "1630_s_at"   "1631_at"    
 [76] "1632_at"     "1661_i_at"   "1662_r_at"   "1663_at"     "1664_at"    
 [81] "1665_s_at"   "1725_s_at"   "1726_at"     "1743_s_at"   "1744_at"    
 [86] "1745_at"     "1746_s_at"   "1790_s_at"   "1812_s_at"   "1813_at"    
 [91] "1818_at"     "1819_at"     "1820_g_at"   "1821_at"     "1822_at"    
 [96] "1823_g_at"   "1837_at"     "1838_g_at"   "1839_at"     "1840_g_at"  
[101] "1841_s_at"   "1842_at"     "1843_at"     "1876_at"     "1877_g_at"  
[106] "1881_at"     "1882_g_at"   "1883_s_at"   "1892_s_at"   "1893_s_at"  
[111] "1894_f_at"   "1903_at"     "1905_s_at"   "1906_at"     "1921_at"    
[116] "1922_g_at"   "1936_s_at"   "1937_at"     "292_s_at"    "293_at"     
[121] "294_s_at"    "295_s_at"    "296_at"      "297_g_at"    "298_at"     
[126] "299_i_at"    "300_f_at"    "301_at"      "302_at"      "303_at"     
[131] "304_at"      "305_g_at"    "311_s_at"    "312_s_at"    "313_at"     
[136] "323_at"      "324_f_at"    "325_s_at"    "326_i_at"    "327_f_at"   
[141] "328_at"      "329_s_at"    "330_s_at"    "331_at"      "332_at"     
[146] "333_s_at"    "334_s_at"    "335_r_at"    "693_g_at"    "694_at"     
[151] "695_at"      "696_at"      "697_f_at"    "698_f_at"    "699_s_at"   
[156] "700_s_at"    "701_s_at"    "702_f_at"    "703_at"      "704_at"     
[161] "705_at"      "706_at"      "707_s_at"    "711_at"      "712_s_at"   
[166] "713_at"      "714_at"      "723_s_at"    "724_at"      "725_i_at"   
[171] "726_f_at"    "727_at"      "728_at"      "729_i_at"    "730_r_at"   
[176] "731_f_at"    "732_f_at"    "733_at"      "734_at"      "735_s_at"   
[181] "918_at"      "919_at"      "920_at"      "921_s_at"    "936_s_at"   
[186] "937_at"      "938_at"      "939_at"      "952_at"      "953_g_at"   
[191] "954_s_at"    "955_at"      "956_at"      "957_at"      "958_s_at"   
[196] "959_at"      "960_g_at" 
  
and these are the differences between the combined table and U95A: 
>unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(dat.r2)[[1]])) 
  [1] "1142_at"   "1143_s_at" "1144_at"   "1145_g_at" "1146_at"   "1147_at"

  [7] "1148_s_at" "1149_at"   "1150_at"   "1151_at"   "1162_g_at" "1163_at"

 [13] "1164_at"   "1170_at"   "1171_s_at" "1172_at"   "1173_g_at" "1174_at"

 [19] "1175_s_at" "1176_at"   "1177_at"   "1178_at"   "1179_at"
"1180_g_at" 
 [25] "1181_at"   "119_at"    "1215_at"   "1216_at"   "124_i_at"  "125_r_at"

 [31] "1278_at"   "1279_s_at" "127_at"    "1280_i_at" "1281_f_at"
"1282_s_at" 
 [37] "1283_at"   "1284_at"   "1285_at"   "1286_s_at" "1301_s_at"
"1302_s_at" 
 [43] "132_at"    "1428_at"   "1429_at"   "1502_s_at" "1513_at"
"1514_g_at" 
 [49] "1515_at"   "1516_g_at" "1608_at"   "1609_g_at" "1623_s_at" "1624_at"

 [55] "1625_at"   "1626_at"   "1627_at"   "1628_at"   "1629_s_at"
"1630_s_at" 
 [61] "1631_at"   "1632_at"   "1661_i_at" "1662_r_at" "1663_at"   "1664_at"

 [67] "1665_s_at" "1725_s_at" "1726_at"   "1743_s_at" "1744_at"   "1745_at"

 [73] "1746_s_at" "1790_s_at" "1812_s_at" "1813_at"   "1818_at"   "1819_at"

 [79] "1820_g_at" "1821_at"   "1822_at"   "1823_g_at" "1829_at"   "1837_at"

 [85] "1838_g_at" "1839_at"   "1840_g_at" "1841_s_at" "1842_at"   "1843_at"

 [91] "1864_at"   "1876_at"   "1877_g_at" "1881_at"   "1882_g_at"
"1883_s_at" 
 [97] "1889_s_at" "1892_s_at" "1893_s_at" "1894_f_at" "1903_at"
"1905_s_at" 
[103] "1906_at"   "1921_at"   "1922_g_at" "1936_s_at" "1937_at"
"1982_s_at" 
[109] "292_s_at"  "293_at"    "294_s_at"  "295_s_at"  "296_at"    "297_g_at"

[115] "298_at"    "299_i_at"  "300_f_at"  "301_at"    "302_at"    "303_at"

[121] "304_at"    "305_g_at"  "311_s_at"  "312_s_at"  "313_at"    "323_at"

[127] "324_f_at"  "325_s_at"  "326_i_at"  "327_f_at"  "328_at"    "329_s_at"

[133] "330_s_at"  "331_at"    "332_at"    "333_s_at"  "334_s_at"  "335_r_at"

[139] "36969_at"  "383_at"    "397_at"    "426_at"    "439_at"    "693_g_at"

[145] "694_at"    "695_at"    "696_at"    "697_f_at"  "698_f_at"  "699_s_at"

[151] "700_s_at"  "701_s_at"  "702_f_at"  "703_at"    "704_at"    "705_at"

[157] "706_at"    "707_s_at"  "711_at"    "712_s_at"  "713_at"    "714_at"

[163] "723_s_at"  "724_at"    "725_i_at"  "726_f_at"  "727_at"    "728_at"

[169] "729_i_at"  "730_r_at"  "731_f_at"  "732_f_at"  "733_at"    "734_at"

[175] "735_s_at"  "787_at"    "788_s_at"  "918_at"    "919_at"    "920_at"

[181] "921_s_at"  "936_s_at"  "937_at"    "938_at"    "939_at"    "952_at"

[187] "953_g_at"  "954_s_at"  "955_at"    "956_at"    "957_at"    "958_s_at"

[193] "959_at"    "960_g_at"  "972_s_at"  "985_s_at"  "997_at"   

Can anybody tell me if there is a mistake in my code or why function
combineAffyBatch() deletes the wrong probe sets? 

Best regards 
Christian



More information about the Bioconductor mailing list