[BioC] Xps package : errors and RMA difference with Partek GS

Fri Apr 3 20:25:15 CEST 2009

Dear Arnaud,

Let me answer your questions individually:

1. Differential expression:
As you see, package xps is quite flexible and allows to use different 
probes for background correction, quantile normalization and 
summarization. This may have effects on the results you get with the 
different settings. Thus, the most important question to determine which 
setting to choose, is:
Do you expect to find differentially expressed genes in your experiment 
or not?
Since the different settings give different results the best way would 
be to confirm one result or the other experimentally, e.g. by doing PCR 
with some of the genes in question.

2. Number of "core" genes:
Please note that the number of "core" transcripts is defined in the 
Affymetrix probeset annotation file. Only probesets with "level=Core" 
are combined as "core" transcripts, and only the subset with 
"crosshyb_type=unique" is used as "metacore" transcripts. Thus the 
number of "core" transcripts depends on the annotation used. For the 
current Affymetrix probeset annotation file version 
"HuEx-1_0-st-v2.na28.hg18.probeset.csv" you get 18708 "core" transcripts 
and 17880 "metacore" transcripts. Since the difference is 828 
transcripts, I assume that you are only using the "metacore" transcripts 
with Partek. Thus, in xps you need to use exonlevel="metacore".

3. Probes used for background correction:
Package xps contains an internal function exonLevel() which is used to 
convert parameter "exonlevel" to an integer. Because of your question I 
have decided to make this function public in version xps_1.2.9, which 
you should be able to download on Monday. The corresponding helpfile 
"?exonLevel" will explain the different integers to be used.
For your convenience the integers are: core (=8192+1024), extended 
(=4096+512), full (=2048+256), ambiguous (=128), affx(=60). Thus 
"core+extended+full" is 16128.

Coincidently, the question which probes to be used for background 
correction was recently asked also at:
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/69fdc2757894f290#

Best regards
Christian

arnaud Le Cavorzin wrote:
> Dear Christian,
>
> Thank you for your answer.
>
> We have tried to perform rma normalization using 
> exonlevel=c(16316,9216,9216) that is to say using all of the probes 
> for the background correction and core probes for the two last steps.
> And the results seems to be better. We found results closed to those 
> obtained with Partek, and after performing t.test and fdr correction 
> we found the same result with Partek and with xps : no differencially 
> expressed genes even if the p-values and fdr values are a slightly 
> different.
>
> In fact there is always 827 genes more with xps than with Partek, so 
> it is maybe the explanation of these differences.
> So our question : Is it possible to perform rma using only 
> core+extended+full probes for the background correction? If yes what 
> is the "number" needs for this? (core => 9216, metacore => 8192, all 
> => 16316)
>
> Thank you very much
> Best regards
>
> Arnaud
>
>
> > Date: Wed, 1 Apr 2009 21:31:54 +0200
> > From: cstrato at aon.at
> > To: arnaudlc at msn.com
> > CC: bioconductor at stat.math.ethz.ch; delphine.rossille at chu-rennes.fr
> > Subject: Re: [BioC] Xps package : errors and RMA difference with 
> Partek GS
> >
> > Dear Arnaud,
> >
> > Yes, xps uses the same background algorithm as APT (and Partek) but 
> uses
> > by default only the probes defined in "exonlevel" for background
> > correction and quantile normalization, i.e. in your case only the 
> "core"
> > probes are used.
> >
> > However, xps offers the possibility to select the probes to be used for
> > background correction, quantile normalization and summarization
> > individually. For RMA normalization you can do:
> >
> > data.rma <- rma(data.exon, "ExonRMAbq16core", filedir=datdir,
> > background="antigenomic",
> > normalize=T, option="transcript",
> > exonlevel=c(16316,16316,9216))
> >
> > This means that probes "core+extended+full+ambiguous+affx" (=16316) are
> > used for background correction and quantile normalization, 
> respectively,
> > and probes "core" (=9216) are used for summarization.
> >
> > As you will see, the results will be more similar to the results
> > obtained with APT (see also Figures 17 vs 15 in APTvsXPS.pdf). However,
> > it is my belief that it is better to use the same probes for all three
> > steps. Maybe this is reflected in your results of finding 
> differentially
> > expressed genes?
> >
> > Best regards
> > Christian
> >
> >
> > arnaud Le Cavorzin wrote:
> > > Hi.
> > >
> > > Thank you for your anwser.
> > >
> > > I have downloaded the last version of xps and performed the
> > > metaProbesets with success.
> > >
> > >
> > > So if I understand, xps use the same background correction algorithm
> > > than Partek or APT. But the difference between these softwares seems
> > > to be caused by the probes used for the background correction.
> > > Moreover it seems to get a difference not only for the background
> > > correction, but for the quantile normalization too, again due to the
> > > different probes.
> > >
> > > In fact I can't get the same results on my samples with xps package
> > > and with Partek.
> > > With Partek we found no expression differences for all of the genes
> > > (after fdr correction), while xps found a little more than 1100 genes
> > > who are differentially expressed (after using prefilter() and
> > > unifilter() function, with the same options than in Partek).
> > >
> > > And I don't know why.
> > >
> > > Thank you for your help and your answer,
> > > Best regards
> > >
> > > Arnaud
> > >
> > >
> > > > Date: Tue, 31 Mar 2009 22:04:42 +0200
> > > > From: cstrato at aon.at
> > > > To: arnaudlc at msn.com
> > > > CC: bioconductor at stat.math.ethz.ch; delphine.rossille at chu-rennes.fr
> > > > Subject: Re: [BioC] Xps package : errors and RMA difference with
> > > Partek GS
> > > >
> > > > Dear Arnaud,
> > > >
> > > > Let me first answer your question how to export the background data
> > > > (although an example is shown in vignette xps.pdf p.15):
> > > >
> > > > # 1. compute background (or rma)
> > > > data.bg <- bgcorrect(data.exon, "ExonRMABgrdCore", filedir=datdir,
> > > > method="rma", select="antigenomic", option="pmonly:epanechnikov",
> > > > params=c(16384), exonlevel="core")
> > > >
> > > > # 2. find background treenames for data.bg
> > > > getTreeNames(rootFile(data.bg))
> > > >
> > > > # 3. get background for all trees
> > > > export.root(rootFile(data.bg), schemeFile(data.bg), 
> setName(data.bg),
> > > > "*", "rbg", "fBg", "BgrdAll.txt")
> > > > # or get background intensity for e.g. tree "BreastA.rbg"
> > > > export.root(rootFile(data.bg), schemeFile(data.bg), 
> setName(data.bg),
> > > > "BreastA", "rbg", "fBg", "BgrdBreastA.txt")
> > > >
> > > > In addition you can also export the background subtracted 
> intensities:
> > > > # 4. get background corrected intensities for all trees
> > > > export.root(rootFile(data.bg), schemeFile(data.bg), 
> setName(data.bg),
> > > > "*", "int", "fInten", "IntenAll.txt")
> > > >
> > > > However, please note that only the "core" probes are corrected, 
> so I am
> > > > not sure if you can use these data in another program.
> > > >
> > > >
> > > > Some notes on background correction:
> > > >
> > > > Please note that the above background correction does NOT use
> > > > "antigenomic" probes for background correction. The parameter
> > > > select="antigenomic" does only define the kind of MM probes 
> (although
> > > > they are not used in this case). Which probes are used as PM 
> probes is
> > > > defined by exonlevel="core", which means that only "core" probes are
> > > > used for background correction.
> > > >
> > > > As you know, in the RMA background algorithm observed PM probes are
> > > > modeled as the sum of a normal noise component and an exponential
> > > signal
> > > > component. Since in above case only "core" probes are selected as PM
> > > > probes, only these probes are used for background correction. 
> This may
> > > > be the reason why the background data differ from the background 
> data
> > > > computed by APT, as explained in vignette APTvsXPS.pdf.
> > > >
> > > > If you want to compute the background using "genomic" or 
> "antigenomic"
> > > > probes and the APT algorithm based on GC content of these probes 
> then
> > > > you need to use:
> > > > data.bg <- bgcorrect(data.exon, "ExonGCBgrdCore", filedir=datdir,
> > > > method="gccontent", select="antigenomic", option="attenuatebg",
> > > > params=c(0.4, 0.005, -1.0), exonlevel="core")
> > > > or the dedicated function:
> > > > data.bg <- bgcorrect.gc(data.exon, "ExonGCBgrdCore", filedir=datdir,
> > > > select="antigenomic", exonlevel="core")
> > > >
> > > >
> > > > Maybe one note on processing time:
> > > > My main goal was to allow processing of exon arrays on computers 
> with
> > > > 1GB RAM only, and to allow access to all interim data such as
> > > background
> > > > intensities and background-corrected probe intensities. Thus, 
> all these
> > > > data are stored as root trees, which means that saving all these
> > > interim
> > > > data on HD is probably the time-consuming step.
> > > >
> > > > I hope that I could answer your questions.
> > > >
> > > > Best regards
> > > > Christian
> > > >
> > > >
> > > > arnaud Le Cavorzin wrote:
> > > > > Hi
> > > > >
> > > > > Thank you for your answer, I will download your last version 
> when it
> > > > > will be available.
> > > > >
> > > > > We found that the difference between Partek GS and xps package 
> is the
> > > > > background correction.
> > > > > And Partek's support confirmed that Partek GS doesn't use 
> genomic or
> > > > > antigenomic background correction but correction like described by
> > > > > Professor Bolstad.
> > > > >
> > > > > But I have an other problem : I would like to perform background
> > > > > correction with xps, using the function bgcorrect(), and after to
> > > > > export the data.bg.rma for performing the normalization and
> > > > > summarization with Partek GS (that take less time than xps)
> > > > > I have tried with function export.expr(), export.data and export()
> > > > > without success.
> > > > >
> > > > > How can I do to extract the data.bg.rma into a .txt file, if 
> it was
> > > > > possible of course?
> > > > >
> > > > > Thanks
> > > > > Best regards
> > > > >
> > > > > Arnaud