[BioC] Differences in results analyzing Mouse Gene 1.0-ST using oligo and affy package
James W. MacDonald
jmacdon at uw.edu
Tue Feb 12 17:06:58 CET 2013
A little more information for those who might care. There are 139
probesets that differ by > |5| between the oligo and affy package. If I
choose one of them to see what probes make up the probeset, I get this:
> get("10345698", mogene10stv1cdf)
pm mm
[1,] 268111 NA
[2,] 468001 NA
[3,] 977212 NA
[4,] 1027059 NA
[5,] 425711 NA
[6,] 248417 NA
So the affy package thinks there are 6 probes in this probeset.
> con <- db(pd.mogene.1.0.st.v1)
> dbGetQuery(con, "select meta_fsetid, x, y from pmfeature inner join
core_mps on pmfeature.fsetid=core_mps.fsetid where
core_mps.meta_fsetid='10345698';")
meta_fsetid x y
1 10345698 150 679
2 10345698 161 505
3 10345698 550 517
4 10345698 23 469
5 10345698 28 775
6 10345698 16 6
7 10345698 256 1035
8 10345698 985 592
9 10345698 916 421
10 10345698 988 754
11 10345698 1028 174
12 10345698 1016 146
13 10345698 592 430
14 10345698 862 584
15 10345698 84 304
16 10345698 845 397
17 10345698 264 798
18 10345698 149 357
19 10345698 332 843
20 10345698 362 215
21 10345698 366 722
22 10345698 339 835
23 10345698 173 898
24 10345698 95 495
25 10345698 101 9
26 10345698 1015 309
27 10345698 933 427
28 10345698 239 1010
29 10345698 135 867
30 10345698 72 838
31 10345698 63 199
32 10345698 412 592
33 10345698 388 339
34 10345698 829 211
35 10345698 110 765
36 10345698 312 386
37 10345698 10 827
38 10345698 165 727
39 10345698 90 234
40 10345698 847 19
41 10345698 28 493
42 10345698 801 246
43 10345698 790 535
44 10345698 577 1002
45 10345698 394 829
46 10345698 258 190
47 10345698 928 762
48 10345698 133 190
49 10345698 712 357
50 10345698 680 24
51 10345698 375 968
52 10345698 482 893
53 10345698 393 521
54 10345698 308 117
55 10345698 360 255
56 10345698 750 445
57 10345698 711 930
58 10345698 158 978
59 10345698 460 405
60 10345698 616 236
And the oligo package says there are 60. And if you look at netaffx, it
says that they have combined 5 exon probesets to create this transcript
probeset, for a total of 60 probes.
So it looks like oligo is current, and the cdf package for affy (which
is based on an unsupported cdf that Affymetrix is obviously not
supporting) is not.
Best,
Jim
On 2/12/2013 10:03 AM, Jon Toledo wrote:
> Thank you James is is exactly what I needed.
>
> Was comparing two groups and when I got back to the project some weeks
> later changed the package and got different results which left me a
> bit puzzled.
>
> > Date: Tue, 12 Feb 2013 09:25:11 -0500
> > From: jmacdon at uw.edu
> > To: tintin_jb at hotmail.com
> > CC: bioconductor at r-project.org
> > Subject: Re: [BioC] Differences in results analyzing Mouse Gene
> 1.0-ST using oligo and affy package
> >
> > Hi Jon,
> >
> > On 2/11/2013 7:40 PM, Jon Toledo wrote:
> > >
> > >
> > >
> > > Dear Bioconductor List,
> > >
> > >
> > > I have repeated my workflow using the affy and oligo package
> alternatively followed by the limma package to analyze and experiment
> with two conditions using Mouse Gene 1.0-ST chips and I arrive to
> different results.
> > >
> > > I have been looking online and found that at least for the for the
> Mouse Gene 1.1-ST the oligo package is preferred, but not anything
> clear for Mouse Gene 1.0-ST .
> > >
> > > I attach below my code and session info:
> > >
> > > A) For running affy:
> > >
> > > library(affy)
> > > library(pd.mogene.1.0.st.v1)
> > > library(mogene10sttranscriptcluster.db)
> > >
> > > cellist=list.celfiles(full.names=T)
> > > cellistD14=grep("CD1...14",cellist,value=T)
> > > esetD14<- justRMA(filenames=gsub("\\./","",cellistD14))
> > >
> > > B) For runnign oligo:
> > >
> > > library(oligo)
> > > library(pd.mogene.1.0.st.v1)
> > > library(mogene10sttranscriptcluster.db)
> > >
> > > cellist=list.celfiles(full.names=T)
> > > cellistD14=grep("CD1...14",cellist,value=T)
> > >
> > > rsetD14=read.celfiles(cellistD14)
> > > esetpD14=rma(rsetD14,target="probeset")
> > > esetD14=rma(rsetD14,target="core")
> > >
> > >
> > > C) Finally running the same analysis:
> > >
> > > designD14<-read.delim('designD14.txt')
> > > contrast.matrix=model.matrix(~treatment,data=designD14)
> > > library(limma)
> > > fit<- lmFit(esetD14, contrast.matrix)
> > > fit<- eBayes(fit,proportion=0.01)
> > >
> > > m1=topTable(fit, coef="treatment",number=1e8,adjust.method="BH")
> > > m1=m1[,c("ID","logFC","P.Value","adj.P.Val")]
> > > m1=cbind(m1[,1:2],FCTreat=2**m1[,2],PTreat=m1[,3],adj.PTreat=m1[,4])
> >
> > You show a lot of code, but paradoxically for all that code I am left
> > wondering what you did, and how things differed.
> >
> > Anyway, either oligo or xps are the only packages that were designed
> for
> > these arrays. The affy package was 'fixed' so it will work, but IMO you
> > should be using oligo or xps, not affy. There are differences between
> > the results for oligo and affy:
> >
> > > library(oligo)
> > > oligo <- rma(read.celfiles(list.celfiles()))
> > > library(affy)
> > > affy <- justRMA()
> > > oligo
> > ExpressionSet (storageMode: lockedEnvironment)
> > assayData: 35556 features, 16 samples
> > element names: exprs
> > protocolData
> > rowNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
> > RH_8867.8_pid1257.CEL (16 total)
> > varLabels: exprs dates
> > varMetadata: labelDescription channel
> > phenoData
> > rowNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
> > RH_8867.8_pid1257.CEL (16 total)
> > varLabels: index
> > varMetadata: labelDescription channel
> > featureData: none
> > experimentData: use 'experimentData(object)'
> > Annotation: pd.mogene.1.0.st.v1
> > > affy
> > ExpressionSet (storageMode: lockedEnvironment)
> > assayData: 34760 features, 16 samples
> > element names: exprs, se.exprs
> > protocolData
> > sampleNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
> > RH_8867.8_pid1257.CEL (16 total)
> > varLabels: ScanDate
> > varMetadata: labelDescription
> > phenoData
> > sampleNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
> > RH_8867.8_pid1257.CEL (16 total)
> > varLabels: sample
> > varMetadata: labelDescription
> > featureData: none
> > experimentData: use 'experimentData(object)'
> > Annotation: mogene10stv1
> >
> > Note that oligo returns 35556 probesets, where affy returns 34760.
> If we
> > subset to overlapping probesets, we still get some differences:
> >
> > > oligo <- exprs(oligo)[featureNames(oligo) %in% featureNames(affy),]
> > > affy <- exprs(affy)[featureNames(affy) %in% row.names(oligo),]
> > > all.equal(row.names(oligo), row.names(affy))
> > [1] TRUE
> > > all.equal(affy, oligo)
> > [1] "Mean relative difference: 0.01049026"
> >
> > And if I look at a summary of these differences, most are very small
> > (the IQR goes from 0.005 - 0.04 or so), with large differences at the
> > extremes (-5, 10). This may be due to differences in the probesets;
> note
> > that the affy package uses an old unsupported CDF file, whereas the
> > oligo package uses the current pgf/clf files.
> >
> > Best,
> >
> > Jim
> >
> >
> >
> >
> > >
> > >> sessionInfo() (This is for the affy run)
> > > R version 2.15.2 (2012-10-26)
> > > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > >
> > > locale:
> > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> > > [4] LC_NUMERIC=C LC_TIME=English_United States.1252
> > >
> > > attached base packages:
> > > [1] stats graphics grDevices utils datasets methods base
> > >
> > > other attached packages:
> > > [1] mogene10stv1cdf_2.11.0 limma_3.14.4
> mogene10sttranscriptcluster.db_8.0.1
> > > [4] org.Mm.eg.db_2.8.0 AnnotationDbi_1.20.3 pd.mogene.1.0.st.v1_3.8.0
> > > [7] oligo_1.22.0 oligoClasses_1.20.0 RSQLite_0.11.2
> > > [10] DBI_0.2-5 affy_1.36.1 Biobase_2.18.0
> > > [13] BiocGenerics_0.4.0
> > >
> > > loaded via a namespace (and not attached):
> > > [1] affxparser_1.30.2 affyio_1.26.0 BiocInstaller_1.8.3
> Biostrings_2.26.3 bit_1.1-9 codetools_0.2-8
> > > [7] ff_2.2-10 foreach_1.4.0 GenomicRanges_1.10.6 IRanges_1.16.4
> iterators_1.0.6 parallel_2.15.2
> > > [13] preprocessCore_1.20.0 splines_2.15.2 stats4_2.15.2
> tools_2.15.2 zlibbioc_1.4.0
> > >
> > > Thank you very much
> > >
> > >
> > > J Toledo
> > > UPenn
> > > USA
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list