[BioC] Xps package : errors and RMA difference with Partek GS

cstrato cstrato at aon.at
Wed Apr 8 17:22:53 CEST 2009


Dear Arnaud,

I agree with you that it is best to compare the results.

Your finding that the different software packages give slightly 
different results is expected for HuExon arrays.

If you want to test if the RMA algorithm is implemented correctly, you 
need to use data from expression arrays, e.g. HG-U133_Plus_2. I have 
done this for xps, affy and APT and the results were identical, as shown 
in Fig.3 of vignette APTvsXPS.pdf.

Best regards
Christian


arnaud Le Cavorzin wrote:
> Dear Christian
>
> Thanks for your answer.
>
> We have used the .mps file created with xps package with Partek, and 
> the results are different too.
>
> I have compared xps package with Partek, APT, Expression Console and 
> aroma.affymetrix and the results are always differents.
>
> So I think I will work with xps and with Partek both, because of the 
> little differences that we found it will be interesting to compare 
> these results.
>
> Thank you very much for your help, your patience and your reactivity.
> Best regards.
>
> Arnaud
>
> > Date: Mon, 6 Apr 2009 23:30:36 +0200
> > From: cstrato at aon.at
> > To: arnaudlc at msn.com
> > CC: bioconductor at stat.math.ethz.ch; delphine.rossille at chu-rennes.fr
> > Subject: Re: [BioC] Xps package : errors and RMA difference with 
> Partek GS
> >
> > Dear Arnaud,
> >
> > As I said already in an earlier mail, when comparing xps and APT I get
> > slightly different expression levels, which may be due to different
> > probes used for background correction and quantile normalization since
> > the results obtained for median-polish only are identical (see vignette
> > APTvsXPS.pdf). For this reason the means are also slightly different,
> > and thus also the p-values.
> >
> > Furthermore, in order to compare xps and APT I needed to create a
> > metaprobeset file using function "metaProbesets()" which I used with
> > APT. This file assures that the same metacore probesets are used by 
> both
> > programs.
> >
> > Please note that cannot comment on differences between xps and any
> > commercial software, I can only compare xps to other open-source
> > programs such as affy and APT. However, if you compare your software to
> > APT and the results are identical, then this would answer your question.
> >
> > Regarding fdr adjustment you will see that "validData(rma.ufr)" lists
> > only the probesets which satisfy the condition pval<0.05. In order to
> > see all probesets you need to do "validData(rma.ufr,"UnitName")" or
> > simply "rma.ufr at data". (I must admit that I need to document this
> > feature in the help file.) Then you should see also differences between
> > p-value and p-adjusted. But I need to investigate further.
> >
> > Best regards
> > Christian
> >
> >
> > arnaud Le Cavorzin wrote:
> > > Dear Christian,
> > >
> > > Thank you very much for your answer and for your reactivity.
> > >
> > > We have performed RMA normalization using 
> exonlevel=c(16316,8192,8192)
> > > that is to say using all probes for the background correction and 
> only
> > > metacore probes for the quantile normalization and summarization, so
> > > the same options using by Partek ("core" in Partek corresponding to
> > > "metacore" in xps, like you have suggested it).
> > >
> > > We get the same number of probeset with Partek and xps package, but
> > > the results are differents for the two softwares.
> > > Even if it was better, we found 3539 genes with a p-value<0.05 with
> > > xps, and 3337 genes with p-value<0.05 with Partek, and the results
> > > remain different for the two softwares. We don't obtain the same
> > > p-value, in particular because we don't obtain the same means.
> > >
> > > I have also imported the data.rma from xps to Partek, and performed
> > > the t test with Partek : the results are the same than performing
> > > unifilter with xps, we obtain the same p value than with xps and the
> > > same means. So they are still different than the results using Partek
> > > only.
> > > (Confirm that there is a difference with the probes used in Partek 
> for
> > > the RMA normalization)
> > >
> > > Another question : when I use fdr correction or no correction with
> > > xps, the results are still the same. Only when I use bonferroni
> > > correction the p-adjusted change. I don't understand why FDR
> > > correction have no effect.
> > >
> > > />
> > > 
> unifltr=UniFilter(unitest=c("t.test","two.sided","none",0,0.0,FALSE,0.95,TRUE),
> > > + foldchange=c(1,"both"),unifilter=c(0.05,"pval"))
> > > >
> > > 
> rma.ufr=unifilter(data.test2.rma,"tmpdt_HuextestUnifilterallmetmet",getwd(),logbase="log2",
> > > + unifltr,group=c("075","075","TEM","TEM"),verbose=FALSE)
> > > /
> > > />
> > > 
> unifltr=UniFilter(unitest=c("t.test","two.sided","fdr",0,0.0,FALSE,0.95,TRUE),
> > > + foldchange=c(1,"both"),unifilter=c(0.05,"pval"))
> > > >
> > > 
> rma.ufr=unifilter(data.test2.rma,"tmpdt_HuextestUnifilterallmetmet",getwd(),logbase="log2",
> > > + unifltr,group=c("075","075","TEM","TEM"),verbose=FALSE)/
> > >
> > >
> > > Thanks
> > > Best regards,
> > >
> > > Arnaud
> > >
> > >
> > > > Date: Fri, 3 Apr 2009 20:25:15 +0200
> > > > From: cstrato at aon.at
> > > > To: arnaudlc at msn.com
> > > > CC: bioconductor at stat.math.ethz.ch; delphine.rossille at chu-rennes.fr
> > > > Subject: Re: [BioC] Xps package : errors and RMA difference with
> > > Partek GS
> > > >
> > > > Dear Arnaud,
> > > >
> > > > Let me answer your questions individually:
> > > >
> > > > 1. Differential expression:
> > > > As you see, package xps is quite flexible and allows to use 
> different
> > > > probes for background correction, quantile normalization and
> > > > summarization. This may have effects on the results you get with the
> > > > different settings. Thus, the most important question to determine
> > > which
> > > > setting to choose, is:
> > > > Do you expect to find differentially expressed genes in your 
> experiment
> > > > or not?
> > > > Since the different settings give different results the best way 
> would
> > > > be to confirm one result or the other experimentally, e.g. by 
> doing PCR
> > > > with some of the genes in question.
> > > >
> > > > 2. Number of "core" genes:
> > > > Please note that the number of "core" transcripts is defined in the
> > > > Affymetrix probeset annotation file. Only probesets with 
> "level=Core"
> > > > are combined as "core" transcripts, and only the subset with
> > > > "crosshyb_type=unique" is used as "metacore" transcripts. Thus the
> > > > number of "core" transcripts depends on the annotation used. For the
> > > > current Affymetrix probeset annotation file version
> > > > "HuEx-1_0-st-v2.na28.hg18.probeset.csv" you get 18708 "core"
> > > transcripts
> > > > and 17880 "metacore" transcripts. Since the difference is 828
> > > > transcripts, I assume that you are only using the "metacore"
> > > transcripts
> > > > with Partek. Thus, in xps you need to use exonlevel="metacore".
> > > >
> > > > 3. Probes used for background correction:
> > > > Package xps contains an internal function exonLevel() which is 
> used to
> > > > convert parameter "exonlevel" to an integer. Because of your 
> question I
> > > > have decided to make this function public in version xps_1.2.9, 
> which
> > > > you should be able to download on Monday. The corresponding helpfile
> > > > "?exonLevel" will explain the different integers to be used.
> > > > For your convenience the integers are: core (=8192+1024), extended
> > > > (=4096+512), full (=2048+256), ambiguous (=128), affx(=60). Thus
> > > > "core+extended+full" is 16128.
> > > >
> > > > Coincidently, the question which probes to be used for background
> > > > correction was recently asked also at:
> > > >
> > > 
> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/69fdc2757894f290#
> > > >
> > > > Best regards
> > > > Christian
> > > >
> > > >
> > > > arnaud Le Cavorzin wrote:
> > > > > Dear Christian,
> > > > >
> > > > > Thank you for your answer.
> > > > >
> > > > > We have tried to perform rma normalization using
> > > > > exonlevel=c(16316,9216,9216) that is to say using all of the 
> probes
> > > > > for the background correction and core probes for the two last 
> steps.
> > > > > And the results seems to be better. We found results closed to 
> those
> > > > > obtained with Partek, and after performing t.test and fdr 
> correction
> > > > > we found the same result with Partek and with xps : no 
> differencially
> > > > > expressed genes even if the p-values and fdr values are a slightly
> > > > > different.
> > > > >
> > > > > In fact there is always 827 genes more with xps than with 
> Partek, so
> > > > > it is maybe the explanation of these differences.
> > > > > So our question : Is it possible to perform rma using only
> > > > > core+extended+full probes for the background correction? If 
> yes what
> > > > > is the "number" needs for this? (core => 9216, metacore => 
> 8192, all
> > > > > => 16316)
> > > > >
> > > > > Thank you very much
> > > > > Best regards
> > > > >
> > > > > Arnaud
> > > > >
> > > > >
> > > > > > Date: Wed, 1 Apr 2009 21:31:54 +0200
> > > > > > From: cstrato at aon.at
> > > > > > To: arnaudlc at msn.com
> > > > > > CC: bioconductor at stat.math.ethz.ch; 
> delphine.rossille at chu-rennes.fr
> > > > > > Subject: Re: [BioC] Xps package : errors and RMA difference with
> > > > > Partek GS
> > > > > >
> > > > > > Dear Arnaud,
> > > > > >
> > > > > > Yes, xps uses the same background algorithm as APT (and 
> Partek) but
> > > > > uses
> > > > > > by default only the probes defined in "exonlevel" for background
> > > > > > correction and quantile normalization, i.e. in your case 
> only the
> > > > > "core"
> > > > > > probes are used.
> > > > > >
> > > > > > However, xps offers the possibility to select the probes to be
> > > used for
> > > > > > background correction, quantile normalization and summarization
> > > > > > individually. For RMA normalization you can do:
> > > > > >
> > > > > > data.rma <- rma(data.exon, "ExonRMAbq16core", filedir=datdir,
> > > > > > background="antigenomic",
> > > > > > normalize=T, option="transcript",
> > > > > > exonlevel=c(16316,16316,9216))
> > > > > >
> > > > > > This means that probes "core+extended+full+ambiguous+affx"
> > > (=16316) are
> > > > > > used for background correction and quantile normalization,
> > > > > respectively,
> > > > > > and probes "core" (=9216) are used for summarization.
> > > > > >
> > > > > > As you will see, the results will be more similar to the results
> > > > > > obtained with APT (see also Figures 17 vs 15 in APTvsXPS.pdf).
> > > However,
> > > > > > it is my belief that it is better to use the same probes for 
> all
> > > three
> > > > > > steps. Maybe this is reflected in your results of finding
> > > > > differentially
> > > > > > expressed genes?
> > > > > >
> > > > > > Best regards
> > > > > > Christian
> > > > > >
> > > > > >
> > > > > > arnaud Le Cavorzin wrote:
> > > > > > > Hi.
> > > > > > >
> > > > > > > Thank you for your anwser.
> > > > > > >
> > > > > > > I have downloaded the last version of xps and performed the
> > > > > > > metaProbesets with success.
> > > > > > >
> > > > > > >
> > > > > > > So if I understand, xps use the same background correction
> > > algorithm
> > > > > > > than Partek or APT. But the difference between these 
> softwares
> > > seems
> > > > > > > to be caused by the probes used for the background correction.
> > > > > > > Moreover it seems to get a difference not only for the 
> background
> > > > > > > correction, but for the quantile normalization too, again due
> > > to the
> > > > > > > different probes.
> > > > > > >
> > > > > > > In fact I can't get the same results on my samples with xps
> > > package
> > > > > > > and with Partek.
> > > > > > > With Partek we found no expression differences for all of the
> > > genes
> > > > > > > (after fdr correction), while xps found a little more than
> > > 1100 genes
> > > > > > > who are differentially expressed (after using prefilter() and
> > > > > > > unifilter() function, with the same options than in Partek).
> > > > > > >
> > > > > > > And I don't know why.
> > > > > > >
> > > > > > > Thank you for your help and your answer,
> > > > > > > Best regards
> > > > > > >
> > > > > > > Arnaud
> > > > > > >
> > > > > > >
> > > > > > > > Date: Tue, 31 Mar 2009 22:04:42 +0200
> > > > > > > > From: cstrato at aon.at
> > > > > > > > To: arnaudlc at msn.com
> > > > > > > > CC: bioconductor at stat.math.ethz.ch;
> > > delphine.rossille at chu-rennes.fr
> > > > > > > > Subject: Re: [BioC] Xps package : errors and RMA 
> difference with
> > > > > > > Partek GS
> > > > > > > >
> > > > > > > > Dear Arnaud,
> > > > > > > >
> > > > > > > > Let me first answer your question how to export the
> > > background data
> > > > > > > > (although an example is shown in vignette xps.pdf p.15):
> > > > > > > >
> > > > > > > > # 1. compute background (or rma)
> > > > > > > > data.bg <- bgcorrect(data.exon, "ExonRMABgrdCore",
> > > filedir=datdir,
> > > > > > > > method="rma", select="antigenomic",
> > > option="pmonly:epanechnikov",
> > > > > > > > params=c(16384), exonlevel="core")
> > > > > > > >
> > > > > > > > # 2. find background treenames for data.bg
> > > > > > > > getTreeNames(rootFile(data.bg))
> > > > > > > >
> > > > > > > > # 3. get background for all trees
> > > > > > > > export.root(rootFile(data.bg), schemeFile(data.bg),
> > > > > setName(data.bg),
> > > > > > > > "*", "rbg", "fBg", "BgrdAll.txt")
> > > > > > > > # or get background intensity for e.g. tree "BreastA.rbg"
> > > > > > > > export.root(rootFile(data.bg), schemeFile(data.bg),
> > > > > setName(data.bg),
> > > > > > > > "BreastA", "rbg", "fBg", "BgrdBreastA.txt")
> > > > > > > >
> > > > > > > > In addition you can also export the background subtracted
> > > > > intensities:
> > > > > > > > # 4. get background corrected intensities for all trees
> > > > > > > > export.root(rootFile(data.bg), schemeFile(data.bg),
> > > > > setName(data.bg),
> > > > > > > > "*", "int", "fInten", "IntenAll.txt")
> > > > > > > >
> > > > > > > > However, please note that only the "core" probes are 
> corrected,
> > > > > so I am
> > > > > > > > not sure if you can use these data in another program.
> > > > > > > >
> > > > > > > >
> > > > > > > > Some notes on background correction:
> > > > > > > >
> > > > > > > > Please note that the above background correction does 
> NOT use
> > > > > > > > "antigenomic" probes for background correction. The 
> parameter
> > > > > > > > select="antigenomic" does only define the kind of MM probes
> > > > > (although
> > > > > > > > they are not used in this case). Which probes are used as PM
> > > > > probes is
> > > > > > > > defined by exonlevel="core", which means that only "core"
> > > probes are
> > > > > > > > used for background correction.
> > > > > > > >
> > > > > > > > As you know, in the RMA background algorithm observed PM
> > > probes are
> > > > > > > > modeled as the sum of a normal noise component and an
> > > exponential
> > > > > > > signal
> > > > > > > > component. Since in above case only "core" probes are
> > > selected as PM
> > > > > > > > probes, only these probes are used for background 
> correction.
> > > > > This may
> > > > > > > > be the reason why the background data differ from the
> > > background
> > > > > data
> > > > > > > > computed by APT, as explained in vignette APTvsXPS.pdf.
> > > > > > > >
> > > > > > > > If you want to compute the background using "genomic" or
> > > > > "antigenomic"
> > > > > > > > probes and the APT algorithm based on GC content of these
> > > probes
> > > > > then
> > > > > > > > you need to use:
> > > > > > > > data.bg <- bgcorrect(data.exon, "ExonGCBgrdCore",
> > > filedir=datdir,
> > > > > > > > method="gccontent", select="antigenomic", 
> option="attenuatebg",
> > > > > > > > params=c(0.4, 0.005, -1.0), exonlevel="core")
> > > > > > > > or the dedicated function:
> > > > > > > > data.bg <- bgcorrect.gc(data.exon, "ExonGCBgrdCore",
> > > filedir=datdir,
> > > > > > > > select="antigenomic", exonlevel="core")
> > > > > > > >
> > > > > > > >
> > > > > > > > Maybe one note on processing time:
> > > > > > > > My main goal was to allow processing of exon arrays on
> > > computers
> > > > > with
> > > > > > > > 1GB RAM only, and to allow access to all interim data 
> such as
> > > > > > > background
> > > > > > > > intensities and background-corrected probe intensities. 
> Thus,
> > > > > all these
> > > > > > > > data are stored as root trees, which means that saving 
> all these
> > > > > > > interim
> > > > > > > > data on HD is probably the time-consuming step.
> > > > > > > >
> > > > > > > > I hope that I could answer your questions.
> > > > > > > >
> > > > > > > > Best regards
> > > > > > > > Christian
> > > > > > > >
> > > > > > > >
> > > > > > > > arnaud Le Cavorzin wrote:
> > > > > > > > > Hi
> > > > > > > > >
> > > > > > > > > Thank you for your answer, I will download your last 
> version
> > > > > when it
> > > > > > > > > will be available.
> > > > > > > > >
> > > > > > > > > We found that the difference between Partek GS and xps
> > > package
> > > > > is the
> > > > > > > > > background correction.
> > > > > > > > > And Partek's support confirmed that Partek GS doesn't use
> > > > > genomic or
> > > > > > > > > antigenomic background correction but correction like
> > > described by
> > > > > > > > > Professor Bolstad.
> > > > > > > > >
> > > > > > > > > But I have an other problem : I would like to perform
> > > background
> > > > > > > > > correction with xps, using the function bgcorrect(), and
> > > after to
> > > > > > > > > export the data.bg.rma for performing the 
> normalization and
> > > > > > > > > summarization with Partek GS (that take less time than 
> xps)
> > > > > > > > > I have tried with function export.expr(), export.data and
> > > export()
> > > > > > > > > without success.
> > > > > > > > >
> > > > > > > > > How can I do to extract the data.bg.rma into a .txt 
> file, if
> > > > > it was
> > > > > > > > > possible of course?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Best regards
> > > > > > > > >
> > > > > > > > > Arnaud
> > > >
> > >
> > > 
> ------------------------------------------------------------------------
> > > Vous voulez savoir ce que vous pouvez faire avec le nouveau Windows
> > > Live ? Lancez-vous !
> > > <http://www.microsoft.com/windows/windowslive/default.aspx>
> >
>
> ------------------------------------------------------------------------
> Découvrez tout ce que Windows Live a à vous apporter ! 
> <http://www.microsoft.com/windows/windowslive/>



More information about the Bioconductor mailing list