[BioC] Differences in results analyzing Mouse Gene 1.0-ST using oligo and affy package

James W. MacDonald jmacdon at uw.edu
Tue Feb 12 15:25:11 CET 2013


Hi Jon,

On 2/11/2013 7:40 PM, Jon Toledo wrote:
>
>
>
> Dear Bioconductor List,
>
>
> I have repeated my workflow using the affy and oligo package alternatively followed by the limma   package to analyze and experiment with two conditions  using Mouse Gene 1.0-ST chips and I arrive to different results.
>
> I have been looking online and found that at least for the  for the Mouse Gene 1.1-ST the oligo package is preferred, but not anything clear for Mouse Gene 1.0-ST .
>
> I attach below my code and session info:
>
> A) For running affy:
>
> library(affy)
> library(pd.mogene.1.0.st.v1)
> library(mogene10sttranscriptcluster.db)
>
> cellist=list.celfiles(full.names=T)
> cellistD14=grep("CD1...14",cellist,value=T)
> esetD14<- justRMA(filenames=gsub("\\./","",cellistD14))
>
> B) For runnign oligo:
>
> library(oligo)
> library(pd.mogene.1.0.st.v1)
> library(mogene10sttranscriptcluster.db)
>
> cellist=list.celfiles(full.names=T)
> cellistD14=grep("CD1...14",cellist,value=T)
>
> rsetD14=read.celfiles(cellistD14)
> esetpD14=rma(rsetD14,target="probeset")
> esetD14=rma(rsetD14,target="core")
>
>
> C) Finally running the same analysis:
>
> designD14<-read.delim('designD14.txt')
> contrast.matrix=model.matrix(~treatment,data=designD14)
> library(limma)
> fit<- lmFit(esetD14, contrast.matrix)
> fit<- eBayes(fit,proportion=0.01)
>
> m1=topTable(fit, coef="treatment",number=1e8,adjust.method="BH")
> m1=m1[,c("ID","logFC","P.Value","adj.P.Val")]
> m1=cbind(m1[,1:2],FCTreat=2**m1[,2],PTreat=m1[,3],adj.PTreat=m1[,4])

You show a lot of code, but paradoxically for all that code I am left 
wondering what you did, and how things differed.

Anyway, either oligo or xps are the only packages that were designed for 
these arrays. The affy package was 'fixed' so it will work, but IMO you 
should be using oligo or xps, not affy. There are differences between 
the results for oligo and affy:

 > library(oligo)
 > oligo <- rma(read.celfiles(list.celfiles()))
 > library(affy)
 > affy <- justRMA()
 > oligo
ExpressionSet (storageMode: lockedEnvironment)
assayData: 35556 features, 16 samples
   element names: exprs
protocolData
   rowNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
     RH_8867.8_pid1257.CEL (16 total)
   varLabels: exprs dates
   varMetadata: labelDescription channel
phenoData
   rowNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
     RH_8867.8_pid1257.CEL (16 total)
   varLabels: index
   varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.mogene.1.0.st.v1
 > affy
ExpressionSet (storageMode: lockedEnvironment)
assayData: 34760 features, 16 samples
   element names: exprs, se.exprs
protocolData
   sampleNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
     RH_8867.8_pid1257.CEL (16 total)
   varLabels: ScanDate
   varMetadata: labelDescription
phenoData
   sampleNames: RH_8500.1_pid1257.CEL RH_8500.4_pid1257.CEL ...
     RH_8867.8_pid1257.CEL (16 total)
   varLabels: sample
   varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: mogene10stv1

Note that oligo returns 35556 probesets, where affy returns 34760. If we 
subset to overlapping probesets, we still get some differences:

 > oligo <- exprs(oligo)[featureNames(oligo) %in% featureNames(affy),]
 > affy <- exprs(affy)[featureNames(affy) %in% row.names(oligo),]
 > all.equal(row.names(oligo), row.names(affy))
[1] TRUE
 > all.equal(affy, oligo)
[1] "Mean relative difference: 0.01049026"

And if I look at a summary of these differences, most are very small 
(the IQR goes from 0.005 - 0.04 or so), with large differences at the 
extremes (-5, 10). This may be due to differences in the probesets; note 
that the affy package uses an old unsupported CDF file, whereas the 
oligo package uses the current pgf/clf files.

Best,

Jim




>
>> sessionInfo() (This is for the affy run)
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] mogene10stv1cdf_2.11.0               limma_3.14.4                         mogene10sttranscriptcluster.db_8.0.1
>   [4] org.Mm.eg.db_2.8.0                   AnnotationDbi_1.20.3                 pd.mogene.1.0.st.v1_3.8.0
>   [7] oligo_1.22.0                         oligoClasses_1.20.0                  RSQLite_0.11.2
> [10] DBI_0.2-5                            affy_1.36.1                          Biobase_2.18.0
> [13] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
>   [1] affxparser_1.30.2     affyio_1.26.0         BiocInstaller_1.8.3   Biostrings_2.26.3     bit_1.1-9             codetools_0.2-8
>   [7] ff_2.2-10             foreach_1.4.0         GenomicRanges_1.10.6  IRanges_1.16.4        iterators_1.0.6       parallel_2.15.2
> [13] preprocessCore_1.20.0 splines_2.15.2        stats4_2.15.2         tools_2.15.2          zlibbioc_1.4.0
>
> Thank you very much
>
>
> J Toledo
> UPenn
> USA
>
>   		 	   		
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list