[BioC] FW: Does combineAffyBatch of package matchprobes elimi nate the wrong probe sets?

Christian.Stratowa@vie.boehringer-ingelheim.com Christian.Stratowa at vie.boehringer-ingelheim.com
Tue Mar 14 14:14:00 CET 2006


Dear Wolfgang

Thank you for your reply, I understand  that you are busy, this 
seems to be a problem of everybody:-(

In the case of "412_s_at" only one of 16 probe sequences are identical, 
i.e. GGAGCCAGGACACCTGCTCTCCGGC at position (202,259) vs (359,471).
This means that one match is sufficient to define the probe set as
"412_s_at".

A worthwhile expansion of function combineAffyBatch() would be to add a 
parameter setting the minimum number of oligos necessary for creating a 
probe set. Sorrowly, I do not have time to implement this expansion.

Best regards
Christian

==============================================
Christian Stratowa, PhD
Boehringer Ingelheim Austria
Dept NCE Lead Discovery - Bioinformatics
Dr. Boehringergasse 5-11
A-1121 Vienna, Austria
Tel.: ++43-1-80105-2470
Fax: ++43-1-80105-2782
email: christian.stratowa at vie.boehringer-ingelheim.com



-----Original Message-----
From: Wolfgang Huber [mailto:huber at ebi.ac.uk] 
Sent: Tuesday, March 14, 2006 13:51
To: Stratowa,Dr.,Christian FEX BIG-AT-V; bioconductor at stat.math.ethz.ch
Subject: Re: FW: [BioC] Does combineAffyBatch of package matchprobes
eliminate the wrong probe sets?


Christian.Stratowa at vie.boehringer-ingelheim.com wrote:
 > Dear Wolfgang
 >
 > Sorrowly, I did not get any response to my question below. I am very
> interested what might be  > the explanation for the behavior described
below.  > Maybe, I made a simple mistake?  > I would appreciate your
explanation very much.  >

Dear Christian,

I apologize - everyone is very busy, we are doing this in our spare time

and are not paid for the consulting. So please excuse if I don't reply 
to every possible mail.

The matching is done by _probe sequence_, the combined probe sets and 
their naming are taken from the first AffyBatch in the argument list. 
The probe set assignments and names of the other AffyBatches are 
ignored, just the probe sequences are used.

I think your question might be sufficiently detailed that you will 
benefit from having a look at the code of that function (it is not that 
complicated), and use the debugger to look at some of the intermediate 
results.  Please also have a look at the man page combineAffyBatch.

This is one way of combining different Affymetrix chip types, there 
might be other schemes that are more suitable for what you are after 
(since 95A and 95Av2 chips are so similar), please feel free to 
implement and contribute them!

  Cheers
  Wolfgang

-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax:   +44 1223 494486
Http:  www.ebi.ac.uk/huber
------------------------------------


>  -----Original Message----- 
> From:   Stratowa,Dr.,Christian FEX BIG-AT-V  
> Sent:   Friday, March 10, 2006 9:17 
> To:     'bioconductor at stat.math.ethz.ch' 
> Subject:        [BioC] Does combineAffyBatch of package matchprobes
> eliminate the wrong probe sets?
> 
> Sorowly, until now I did not receive an answer to my question, so here

> is some more information:
> 
> The main problem is that combineAffyBatch() adds probe set "412_s_at" 
> to the list of common probe sets on U95A and U95Av2 although this 
> probe set does NOT exist on U95Av2 but was
> replaced by probe set "160033_s_at". Analogously, other probe sets
were
> replaced on U95Av2 
> too, but are not  listed as common probe sets, e.g. "1829_at" is
replaced by
> "160020_s_at". 
> For this reason I was not able to subset the mas5 expression values
for U95A
> and U95Av2 
> to the common probe sets. 
> 
> So why is probe set "412_s_at" added to the list of common probe sets 
> but e.g. "1829_at" not, although both probe sets are replaced by 
> oligos with the same (x,y)-coordinates but with
> different oligonucleotide sequences? 
> 
> Here is the addition to the code below to show these issues:
> 
> # setdiff as vectors
> 
>>a21 <- 
>>unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(x95A.r2)[[1]]))
>>a12 <-
unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(x95Av2.r2)[[1]])) 
> 
> 
>>a2 <- unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(dat.r2)[[1]]))
>>a1 <- unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(dat.r2)[[1]])) 
> 
> 
>>setdiff(a1,a2)
> 
>  [1] "119_at"    "1215_at"   "1216_at"   "124_i_at"  "125_r_at"
"127_at"   
>  [7] "1301_s_at" "1302_s_at" "132_at"    "1429_at"   "1502_s_at"
"1829_at"  
> [13] "1864_at"   "1889_s_at" "1982_s_at" "36969_at"  "383_at"
"397_at"   
> [19] "426_at"    "439_at"    "787_at"    "788_s_at"  "972_s_at"
"985_s_at" 
> [25] "997_at"
> 
> 
>>setdiff(a2,a1)
> 
>  [1] "160020_at"   "160021_r_at" "160022_at"   "160023_at"
"160024_at"  
>  [6] "160025_at"   "160026_at"   "160027_s_at" "160028_s_at"
"160029_at"  
> [11] "160030_at"   "160031_at"   "160032_at"   "160033_s_at"
"160034_s_at" 
> [16] "160035_at"   "160036_at"   "160037_at"   "160038_s_at"
"160039_at"  
> [21] "160040_at"   "160041_at"   "160042_s_at" "160043_at"
"160044_g_at" 
> 
> # Problem: probeset 412_s_at does NOT exist on U95Av2 and thus should 
> not be listed on combined probe set!
> 
>>setdiff(a21, setdiff(a2,a1))
> 
> character(0)
> 
>>setdiff(a12, setdiff(a1,a2))
> 
> [1] "412_s_at"
> 
> Here are the informations from the  respective probe.tab files:
> #HG-U95A_probe.tab 
> 412_s_at        286     579     1795    AGCCACGGGCGTCAGAGAGACCCGG
> Antisense 
> 412_s_at        494     633     1798    CACGGGCGTCAGAGAGACCCGGGAA
> Antisense 
> 412_s_at        446     145     1807    CAGAGAGACCCGGGAAGGAAGGCTC
> Antisense 
> 412_s_at        77      573     1810    AGAGACCCGGGAAGGAAGGCTCTCG
> Antisense 
> 412_s_at        202     259     1841    GGAGCCAGGACACCTGCTCTCCGGC
> Antisense 
> 412_s_at        141     403     1842    GAGCCAGGACACCTGCTCTCCGGCG
> Antisense 
> 412_s_at        302     119     1846    CAGGACACCTGCTCTCCGGCGCAGA
> Antisense 
> 412_s_at        354     285     1854    CTGCTCTCCGGCGCAGACAGCGGGG
> Antisense 
> 412_s_at        133     489     1855    TGCTCTCCGGCGCAGACAGCGGGGC
> Antisense 
> 412_s_at        134     489     1856    GCTCTCCGGCGCAGACAGCGGGGCC
> Antisense 
> 412_s_at        206     383     1858    TCTCCGGCGCAGACAGCGGGGCCCA
> Antisense 
> 412_s_at        206     385     1859    CTCCGGCGCAGACAGCGGGGCCCAG
> Antisense 
> 412_s_at        192     237     1864    GCGCAGACAGCGGGGCCCAGCGCTC
> Antisense 
> 412_s_at        191     237     1865    CGCAGACAGCGGGGCCCAGCGCTCT
> Antisense 
> 412_s_at        227     191     1868    AGACAGCGGGGCCCAGCGCTCTCCT
> Antisense 
> 412_s_at        364     57      1870    ACAGCGGGGCCCAGCGCTCTCCTGG
> Antisense 
> 
> #HG-U95Av2_probe.tab 
> 160033_s_at     1       364     57      1504
GTGCTCCAGGAAGATATAGACATTG
> Antisense 
> 160033_s_at     2       302     119     1533
GGTACAGTCAGAAGGACAGGACAAT
> Antisense 
> 160033_s_at     3       446     145     1534
GTACAGTCAGAAGGACAGGACAATG
> Antisense 
> 160033_s_at     4       227     191     1568
ATTCTGGGGACACAGAGGATGAGCT
> Antisense 
> 160033_s_at     5       191     237     1648
GGGGAAGACCCGTATGCAGGCTCCA
> Antisense 
> 160033_s_at     6       192     237     1700
ACCAGGAGCCTCCTGATCTGCCAGT
> Antisense 
> 160033_s_at     7       202     259     1718
TGCCAGTCCCTGAGCTCCCAGATTT
> Antisense 
> 160033_s_at     8       354     285     1752
CAAGCACTTCTTTCTTTACGGGGAG
> Antisense 
> 160033_s_at     9       206     383     1787
ACGAGCGGCGGAAACTCATCCGATA
> Antisense 
> 160033_s_at     10      206     385     1797
GAAACTCATCCGATACGTCACAGCC
> Antisense 
> 160033_s_at     11      141     403     1901
AGGAGGCCCTGATGGACAACCCCTC
> Antisense 
> 160033_s_at     12      133     489     1926
CCTGGCATTCGTTCGTCCCCGATGG
> Antisense 
> 160033_s_at     13      134     489     1952
TCTACAGTTGCAATGAGAAGCAGAA
> Antisense 
> 160033_s_at     14      77      573     1969
AAGCAGAAGTTACTTCCTCACCAGC
> Antisense 
> 160033_s_at     15      286     579     1980
ACTTCCTCACCAGCTCTATGGGGTG
> Antisense 
> 160033_s_at     16      494     633     2006
TGCCGCAAGCCTGAAGTATGTGCTA
> Antisense
> 
> P.S.:
> Looking at the Affymetrix "xx_annot.cvs" and "xx_probe.tab" files I
realized
> that the oligo sequence is 
> missing in the probe.tab file for all probe-sets which are annotated
as
> "Sequence Source" = TIGR! 
> It is not clear while the TIGR sequences are not listed in the
probe.tab
> files. However, this explains 
> why function combineAffyBatch() deletes 197 probe sets and not only
the 51
> different probe sets. 
> 
> Best regards
> Christian 
> 
> ==============================================
> Christian Stratowa, PhD 
> Boehringer Ingelheim Austria 
> Dept NCE Lead Discovery - Bioinformatics 
> Dr. Boehringergasse 5-11 
> A-1121 Vienna, Austria 
> Tel.: ++43-1-80105-2470 
> Fax: ++43-1-80105-2782 
> email: christian.stratowa at vie.boehringer-ingelheim.com 
> 
> Maybe I am doing something wrong but when combining U95A and U95Av2 
> files using function combineAffyBatch() it seems that 197 probe sets 
> from U95A and from U95Av2 are deleted although there are only 51 
> different probe sets. Furthermore, it seems that probe sets which are 
> available on both U95A and U95Av2 are deleted and not the different
> probe sets. 
> 
> Here is my code, which supports my statements:
> # call libraries 
> 
>>library(matchprobes)
>>library(affy) 
>>library(hgu95acdf) 
>>library(hgu95aprobe) 
>>library(hgu95av2cdf) 
>>library(hgu95av2probe) 
> 
> 
> # load CEL files from folders HG-U95A and HG-U95Av2
> 
>>x95A <- ReadAffy(celfile.path="HG-U95A")
>>x95Av2 <- ReadAffy(celfile.path="HG-U95Av2") 
> 
> 
> # combine data
> 
>>res <- combineAffyBatch(list(x95A,x95Av2),
> 
> c("hgu95aprobe","hgu95av2probe"), newcdf="comb95")
> 
>>comb95 <- res$cdf
> 
> 
> # rma normalization for combined data
> 
>>dat.rma <- rma(res$dat)
> 
> 
> # rma normalization for U95A and U95Av2 separately
> 
>>x95A.rma <- rma(x95A)
>>x95Av2.rma <- rma(x95Av2) 
> 
> 
> # store log2 of expression data as matrix, sort for AffyIDs, and 
> export
> 
>>dat.r2 <- exprs(dat.rma)
>>d <- dimnames(dat.r2)[[1]] 
>>dat.r2 <- dat.r2[order(d),] 
>>write.table(dat.r2,file="dat_rma.txt",sep="\t",quote=F,col.names=NA) 
> 
> 
>>x95A.r2 <- exprs(x95A.rma)
>>d <- dimnames(x95A.r2)[[1]] 
>>x95A.r2 <- x95A.r2[order(d),] 
>>write.table(x95A.r2,file="x95A_rma.txt",sep="\t",quote=F,col.names=NA)

> 
> 
>>x95Av2.r2 <- exprs(x95Av2.rma)
>>d <- dimnames(x95Av2.r2)[[1]] 
>>x95Av2.r2 <- x95Av2.r2[order(d),] 
>>write.table(x95Av2.r2,file="x95Av2_rma.txt",sep="\t",quote=F,col.names
=NA) 
> 
> 
> The following 25 probe sets are on U95Av2 but NOT on U95A:
> 
>>unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(x95A.r2)[[1]]))
> 
>  [1] "160020_at"   "160021_r_at" "160022_at"   "160023_at"
"160024_at"  
>  [6] "160025_at"   "160026_at"   "160027_s_at" "160028_s_at"
"160029_at"  
> [11] "160030_at"   "160031_at"   "160032_at"   "160033_s_at"
"160034_s_at" 
> [16] "160035_at"   "160036_at"   "160037_at"   "160038_s_at"
"160039_at"  
> [21] "160040_at"   "160041_at"   "160042_s_at" "160043_at"
"160044_g_at" 
> 
> The following 26 probe sets are on U95A but NOT on U95Av2:
> 
>>unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(x95Av2.r2)[[1]]))
> 
>  [1] "119_at"    "1215_at"   "1216_at"   "124_i_at"  "125_r_at"
"127_at"   
>  [7] "1301_s_at" "1302_s_at" "132_at"    "1429_at"   "1502_s_at"
"1829_at"  
> [13] "1864_at"   "1889_s_at" "1982_s_at" "36969_at"  "383_at"
"397_at"   
> [19] "412_s_at"  "426_at"    "439_at"    "787_at"    "788_s_at"
"972_s_at" 
> [25] "985_s_at"  "997_at"
> 
> This is corrrect, since I checked this also manually!
> 
> Thus alltogether 51 probe sets which are not on both chips, should be 
> eliminated by function combineAffyBatch(). Is this correct?
> 
> However, these are the differences between the combined table and 
> U95Av2:
> 
>
<<...... SNIP >>>>



More information about the Bioconductor mailing list