[BioC] Is GCRMA influenced by lexicographical order?

Benilton Carvalho beniltoncarvalho at gmail.com
Mon Nov 8 17:10:59 CET 2010


just set the random seed to the same value prior to running
bg.correct.gcrma() and you should be fine...

## setwd(CASEI)
set.seed(1)
CEL = ReadAffy()
ebag.1 = exprs(bg.adjust.gcrma(CEL))
## setwd(CASEII)
set.seed(1)
CEL = ReadAffy()
ebag.2 = exprs(bg.adjust.gcrma(CEL))

b

On 8 November 2010 15:50, Wolfgang Huber <whuber at embl.de> wrote:
> Hi Markus,
>
> Jean or Rafa will be able to give a more competent response, but as far as I
> can see from the code, there are some places where the first array is
> treated specially:
>
> ~/madman/Rpacks/gcrma/R$ grep ",1]" *
> gcrma.R:      index2=which(!is.na(anc[,1]))
> gcrma.engine2.R:    index.affinities <- which(!is.na(pm.affinities[,1]))
> justGCRMA.R:       mm <- read.probematrix(filenames=filenames[i],
> which="mm")$mm[,1]
> justGCRMA.R:    mm <- read.probematrix(filenames=filenames[i],
> which="mm")$mm[,1]
>
> I agree that this is not the most desirable feature.
>
> Best wishes
>        Wolfgang
>
>
> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto:
>>
>> Dear all,
>>
>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained
>> in the gcrma package.
>>
>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of
>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I
>> rename both files in a way, which changes the lexicographical order,
>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II).
>>
>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for
>> some probe sets I get different values for A.cel (in CASE I) than for
>> BA.cel (in CASE II), although both files contain the same data. Of
>> course, for B.cel and AB.cel the same phenomenon becomes obvious.
>>
>> How can this be possible? To me, it seems as if GCRMA normalizes the
>> data with respect to the lexicographical order, but as far as I know,
>> GCRMA treats each array independently.
>>
>> This effect is not reproducible using call.exprs() with argument
>> algorithm="rma" or "mas5". But is reproducible for other arrays like
>> HGU95A, for example.
>>
>> Please, can anybody try to explain me, if this effect is a bug or a
>> feature (and why)?
>>
>> Best wishes,
>> Markus
>>
>>
>>
>> #################################################################
>>
>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>> other attached packages:
>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4]
>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0
>> Biobase_2.10.0
>> loaded via a namespace (and not attached):
>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2
>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0
>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6
>>
>> Also tried on other machine with similar packages
>>
>> sessionInfo()
>> R version 2.11.1 Patched (2010-09-16 r52943)
>> Platform: i686-pc-linux-gnu (32-bit)
>>
>> locale:
>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8
>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1
>> [5] Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2
>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17
>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1
>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6
>>
>>
>> #################################################################
>>
>> Code example
>>
>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel
>> CEL <- ReadAffy()
>> bag.1 <- bg.adjust.gcrma(CEL)
>> ebag.1 <- exprs(bag.1)
>>
>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel
>> CEL <- ReadAffy()
>> bag.2 <- bg.adjust.gcrma(CEL)
>> ebag.2 <- exprs(bag.2)
>>
>> par(mfrow=c(2,1))
>> # differences should be zero
>> plot(ebag.1[,1]-ebag.2[,2])
>> plot(ebag.1[,2]-ebag.2[,1])
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list