[BioC] Is GCRMA influenced by lexicographical order?

Mon Nov 8 17:15:01 CET 2010

Hi,

It seem to be more correct to say that it depends on the ordering of
the arrays loaded, which is in turn by default loaded in lexicographic
ordering.

On Mon, Nov 8, 2010 at 7:50 AM, Wolfgang Huber <whuber at embl.de> wrote:
> Hi Markus,
>
> Jean or Rafa will be able to give a more competent response, but as far as I
> can see from the code, there are some places where the first array is
> treated specially:
>
> ~/madman/Rpacks/gcrma/R$ grep ",1]" *
> gcrma.R:      index2=which(!is.na(anc[,1]))
> gcrma.engine2.R:    index.affinities <- which(!is.na(pm.affinities[,1]))
> justGCRMA.R:       mm <- read.probematrix(filenames=filenames[i],
> which="mm")$mm[,1]
> justGCRMA.R:    mm <- read.probematrix(filenames=filenames[i],
> which="mm")$mm[,1]

The latter to lines only reads one array, so there [,1] is just to
pull out the signals, I think.

Note also that the GCRMA algorithm is using a random subset to
estimate some the parameters, which could affect this too.  However,
looking at the devel code for GCRMA is see that there are
"set.seed(1)" calls before each sample() call, i.e. they are no longer
"random subsets".

I though this was discussed a long time ago and all fixing of the
seeds were dropped, but now it's back again.  Is my bad memory playing
me a trick?

/H

>
> I agree that this is not the most desirable feature.
>
> Best wishes
>        Wolfgang
>
>
> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto:
>>
>> Dear all,
>>
>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained
>> in the gcrma package.
>>
>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of
>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I
>> rename both files in a way, which changes the lexicographical order,
>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II).
>>
>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for
>> some probe sets I get different values for A.cel (in CASE I) than for
>> BA.cel (in CASE II), although both files contain the same data. Of
>> course, for B.cel and AB.cel the same phenomenon becomes obvious.
>>
>> How can this be possible? To me, it seems as if GCRMA normalizes the
>> data with respect to the lexicographical order, but as far as I know,
>> GCRMA treats each array independently.
>>
>> This effect is not reproducible using call.exprs() with argument
>> algorithm="rma" or "mas5". But is reproducible for other arrays like
>> HGU95A, for example.
>>
>> Please, can anybody try to explain me, if this effect is a bug or a
>> feature (and why)?
>>
>> Best wishes,
>> Markus
>>
>>
>>
>> #################################################################
>>
>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>> other attached packages:
>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4]
>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0
>> Biobase_2.10.0
>> loaded via a namespace (and not attached):
>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2
>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0
>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6
>>
>> Also tried on other machine with similar packages
>>
>> sessionInfo()
>> R version 2.11.1 Patched (2010-09-16 r52943)
>> Platform: i686-pc-linux-gnu (32-bit)
>>
>> locale:
>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8
>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1
>> [5] Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2
>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17
>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1
>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6
>>
>>
>> #################################################################
>>
>> Code example
>>
>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel
>> CEL <- ReadAffy()
>> bag.1 <- bg.adjust.gcrma(CEL)
>> ebag.1 <- exprs(bag.1)
>>
>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel
>> CEL <- ReadAffy()
>> bag.2 <- bg.adjust.gcrma(CEL)
>> ebag.2 <- exprs(bag.2)
>>
>> par(mfrow=c(2,1))
>> # differences should be zero
>> plot(ebag.1[,1]-ebag.2[,2])
>> plot(ebag.1[,2]-ebag.2[,1])
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>