[BioC] Differences in results analyzing Mouse Gene 1.0-ST using oligo and affy package

Tue Feb 12 17:06:58 CET 2013

A little more information for those who might care. There are 139 
probesets that differ by > |5| between the oligo and affy package. If I 
choose one of them to see what probes make up the probeset, I get this:

 > get("10345698", mogene10stv1cdf)
           pm mm
[1,]  268111 NA
[2,]  468001 NA
[3,]  977212 NA
[4,] 1027059 NA
[5,]  425711 NA
[6,]  248417 NA

So the affy package thinks there are 6 probes in this probeset.

 > con <- db(pd.mogene.1.0.st.v1)
 > dbGetQuery(con, "select meta_fsetid, x, y from pmfeature inner join 
core_mps on pmfeature.fsetid=core_mps.fsetid where 
core_mps.meta_fsetid='10345698';")

    meta_fsetid    x    y
1     10345698  150  679
2     10345698  161  505
3     10345698  550  517
4     10345698   23  469
5     10345698   28  775
6     10345698   16    6
7     10345698  256 1035
8     10345698  985  592
9     10345698  916  421
10    10345698  988  754
11    10345698 1028  174
12    10345698 1016  146
13    10345698  592  430
14    10345698  862  584
15    10345698   84  304
16    10345698  845  397
17    10345698  264  798
18    10345698  149  357
19    10345698  332  843
20    10345698  362  215
21    10345698  366  722
22    10345698  339  835
23    10345698  173  898
24    10345698   95  495
25    10345698  101    9
26    10345698 1015  309
27    10345698  933  427
28    10345698  239 1010
29    10345698  135  867
30    10345698   72  838
31    10345698   63  199
32    10345698  412  592
33    10345698  388  339
34    10345698  829  211
35    10345698  110  765
36    10345698  312  386
37    10345698   10  827
38    10345698  165  727
39    10345698   90  234
40    10345698  847   19
41    10345698   28  493
42    10345698  801  246
43    10345698  790  535
44    10345698  577 1002
45    10345698  394  829
46    10345698  258  190
47    10345698  928  762
48    10345698  133  190
49    10345698  712  357
50    10345698  680   24
51    10345698  375  968
52    10345698  482  893
53    10345698  393  521
54    10345698  308  117
55    10345698  360  255
56    10345698  750  445
57    10345698  711  930
58    10345698  158  978
59    10345698  460  405
60    10345698  616  236

And the oligo package says there are 60. And if you look at netaffx, it 
says that they have combined 5 exon probesets to create this transcript 
probeset, for a total of 60 probes.

So it looks like oligo is current, and the cdf package for affy (which 
is based on an unsupported cdf that Affymetrix is obviously not 
supporting) is not.

Best,

Jim

On 2/12/2013 10:03 AM, Jon Toledo wrote:
> Thank you James is is exactly what I needed.
>
> Was comparing two groups and when I got back to the project some weeks 
> later changed the package and got different results which left me a 
> bit puzzled.
>
> > Date: Tue, 12 Feb 2013 09:25:11 -0500
> > From: jmacdon at uw.edu
> > To: tintin_jb at hotmail.com
> > CC: bioconductor at r-project.org
> > Subject: Re: [BioC] Differences in results analyzing Mouse Gene 
> 1.0-ST using oligo and affy package
> >
> > Hi Jon,
> >
> > On 2/11/2013 7:40 PM, Jon Toledo wrote:
> > >
> > >
> > >
> > > Dear Bioconductor List,
> > >
> > >
> > > I have repeated my workflow using the affy and oligo package 
> alternatively followed by the limma package to analyze and experiment 
> with two conditions using Mouse Gene 1.0-ST chips and I arrive to 
> different results.
> > >
> > > I have been looking online and found that at least for the for the 
> Mouse Gene 1.1-ST the oligo package is preferred, but not anything 
> clear for Mouse Gene 1.0-ST .
> > >
> > > I attach below my code and session info:
> > >
> > > A) For running affy:
> > >
> > > library(affy)
> > > library(pd.mogene.1.0.st.v1)
> > > library(mogene10sttranscriptcluster.db)
> > >
> > > cellist=list.celfiles(full.names=T)
> > > cellistD14=grep("CD1...14",cellist,value=T)
> > > esetD14<- justRMA(filenames=gsub("\\./","",cellistD14))
> > >
> > > B) For runnign oligo:
> > >
> > > library(oligo)
> > > library(pd.mogene.1.0.st.v1)
> > > library(mogene10sttranscriptcluster.db)
> > >
> > > cellist=list.celfiles(full.names=T)
> > > cellistD14=grep("CD1...14",cellist,value=T)
> > >
> > > rsetD14=read.celfiles(cellistD14)
> > > esetpD14=rma(rsetD14,target="probeset")
> > > esetD14=rma(rsetD14,target="core")
> > >
> > >
> > > C) Finally running the same analysis:
> > >
> > > designD14<-read.delim('designD14.txt')
> > > contrast.matrix=model.matrix(~treatment,data=designD14)
> > > library(limma)
> > > fit<- lmFit(esetD14, contrast.matrix)
> > > fit<- eBayes(fit,proportion=0.01)
> > >
> > > m1=topTable(fit, coef="treatment",number=1e8,adjust.method="BH")
> > > m1=m1[,c("ID","logFC","P.Value","adj.P.Val")]
> > > m1=cbind(m1[,1:2],FCTreat=2**m1[,2],PTreat=m1[,3],adj.PTreat=m1[,4])
> >
> > You show a lot of code, but paradoxically for all that code I am left
> > wondering what you did, and how things differed.
> >
> > Anyway, either oligo or xps are the only packages that were designed 
> for
> > these arrays. The affy package was 'fixed' so it will work, but IMO you
> > should be using oligo or xps, not affy. There are differences between
> > the results for oligo and affy:
> >
> > > library(oligo)
> > > oligo <- rma(read.celfiles(list.celfiles()))
> > > library(affy)
> > > affy <- justRMA()
> > > oligo
> > ExpressionSet (storageMode: lockedEnvironment)
> > assayData: 35556 features, 16 samples
> > element names: exprs
> > protocolData
> > rowNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
> > RH_8867.8_pid1257.CEL (16 total)
> > varLabels: exprs dates
> > varMetadata: labelDescription channel
> > phenoData
> > rowNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
> > RH_8867.8_pid1257.CEL (16 total)
> > varLabels: index
> > varMetadata: labelDescription channel
> > featureData: none
> > experimentData: use 'experimentData(object)'
> > Annotation: pd.mogene.1.0.st.v1
> > > affy
> > ExpressionSet (storageMode: lockedEnvironment)
> > assayData: 34760 features, 16 samples
> > element names: exprs, se.exprs
> > protocolData
> > sampleNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
> > RH_8867.8_pid1257.CEL (16 total)
> > varLabels: ScanDate
> > varMetadata: labelDescription
> > phenoData
> > sampleNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
> > RH_8867.8_pid1257.CEL (16 total)
> > varLabels: sample
> > varMetadata: labelDescription
> > featureData: none
> > experimentData: use 'experimentData(object)'
> > Annotation: mogene10stv1
> >
> > Note that oligo returns 35556 probesets, where affy returns 34760. 
> If we
> > subset to overlapping probesets, we still get some differences:
> >
> > > oligo <- exprs(oligo)[featureNames(oligo) %in% featureNames(affy),]
> > > affy <- exprs(affy)[featureNames(affy) %in% row.names(oligo),]
> > > all.equal(row.names(oligo), row.names(affy))
> > [1] TRUE
> > > all.equal(affy, oligo)
> > [1] "Mean relative difference: 0.01049026"
> >
> > And if I look at a summary of these differences, most are very small
> > (the IQR goes from 0.005 - 0.04 or so), with large differences at the
> > extremes (-5, 10). This may be due to differences in the probesets; 
> note
> > that the affy package uses an old unsupported CDF file, whereas the
> > oligo package uses the current pgf/clf files.
> >
> > Best,
> >
> > Jim
> >
> >
> >
> >
> > >
> > >> sessionInfo() (This is for the affy run)
> > > R version 2.15.2 (2012-10-26)
> > > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > >
> > > locale:
> > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United 
> States.1252 LC_MONETARY=English_United States.1252
> > > [4] LC_NUMERIC=C LC_TIME=English_United States.1252
> > >
> > > attached base packages:
> > > [1] stats graphics grDevices utils datasets methods base
> > >
> > > other attached packages:
> > > [1] mogene10stv1cdf_2.11.0 limma_3.14.4 
> mogene10sttranscriptcluster.db_8.0.1
> > > [4] org.Mm.eg.db_2.8.0 AnnotationDbi_1.20.3 pd.mogene.1.0.st.v1_3.8.0
> > > [7] oligo_1.22.0 oligoClasses_1.20.0 RSQLite_0.11.2
> > > [10] DBI_0.2-5 affy_1.36.1 Biobase_2.18.0
> > > [13] BiocGenerics_0.4.0
> > >
> > > loaded via a namespace (and not attached):
> > > [1] affxparser_1.30.2 affyio_1.26.0 BiocInstaller_1.8.3 
> Biostrings_2.26.3 bit_1.1-9 codetools_0.2-8
> > > [7] ff_2.2-10 foreach_1.4.0 GenomicRanges_1.10.6 IRanges_1.16.4 
> iterators_1.0.6 parallel_2.15.2
> > > [13] preprocessCore_1.20.0 splines_2.15.2 stats4_2.15.2 
> tools_2.15.2 zlibbioc_1.4.0
> > >
> > > Thank you very much
> > >
> > >
> > > J Toledo
> > > UPenn
> > > USA
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099