[BioC] Is GCRMA influenced by lexicographical order?

Mon Nov 8 17:18:09 CET 2010

On Mon, Nov 8, 2010 at 8:13 AM, Zhijin Wu <zwu at stat.brown.edu> wrote:
>
> On 11/8/2010 10:50 AM, Wolfgang Huber wrote:
>>
>> Hi Markus,
>>
>> Jean or Rafa will be able to give a more competent response, but as far
>> as I can see from the code, there are some places where the first array
>> is treated specially:
>>
>> ~/madman/Rpacks/gcrma/R$ grep ",1]" *
>> gcrma.R: index2=which(!is.na(anc[,1]))
>> gcrma.engine2.R: index.affinities <- which(!is.na(pm.affinities[,1]))
>
> Those two "[,1] " in the code is actually "without loss of generality"
> reference. Affinities are missing if the probe sequence information is
> missing, and if it does it would be for all arrays. Thus by finding out
> which ones are missing in any array suffice -- and since there will always
> be at least one array, we did [,1] instead of [,10] or any other integer.

A safer strategy would be to identify it across *all* arrays (using OR
logic), in case the is.na() is not the same for each array (which
could happen under certain cases).

/Henrik

>
> I will test it and see if I can reproduce your problem and see where the
> discrepancy could arise.
>
> Best,
> Jean
>
>
>
>> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto:
>>>
>>> Dear all,
>>>
>>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained
>>> in the gcrma package.
>>>
>>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of
>>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I
>>> rename both files in a way, which changes the lexicographical order,
>>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II).
>>>
>>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for
>>> some probe sets I get different values for A.cel (in CASE I) than for
>>> BA.cel (in CASE II), although both files contain the same data. Of
>>> course, for B.cel and AB.cel the same phenomenon becomes obvious.
>>>
>>> How can this be possible? To me, it seems as if GCRMA normalizes the
>>> data with respect to the lexicographical order, but as far as I know,
>>> GCRMA treats each array independently.
>>>
>>> This effect is not reproducible using call.exprs() with argument
>>> algorithm="rma" or "mas5". But is reproducible for other arrays like
>>> HGU95A, for example.
>>>
>>> Please, can anybody try to explain me, if this effect is a bug or a
>>> feature (and why)?
>>>
>>> Best wishes,
>>> Markus
>>>
>>>
>>>
>>> #################################################################
>>>
>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>> other attached packages:
>>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4]
>>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0
>>> Biobase_2.10.0
>>> loaded via a namespace (and not attached):
>>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2
>>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0
>>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6
>>>
>>> Also tried on other machine with similar packages
>>>
>>> sessionInfo()
>>> R version 2.11.1 Patched (2010-09-16 r52943)
>>> Platform: i686-pc-linux-gnu (32-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8
>>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1
>>> [5] Biobase_2.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2
>>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17
>>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1
>>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6
>>>
>>>
>>> #################################################################
>>>
>>> Code example
>>>
>>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel
>>> CEL <- ReadAffy()
>>> bag.1 <- bg.adjust.gcrma(CEL)
>>> ebag.1 <- exprs(bag.1)
>>>
>>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel
>>> CEL <- ReadAffy()
>>> bag.2 <- bg.adjust.gcrma(CEL)
>>> ebag.2 <- exprs(bag.2)
>>>
>>> par(mfrow=c(2,1))
>>> # differences should be zero
>>> plot(ebag.1[,1]-ebag.2[,2])
>>> plot(ebag.1[,2]-ebag.2[,1])
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> ------------------------------------
> Zhijin (Jean) Wu
> Assistant Professor of Biostatistics
> Brown University, Box G-S121
> Providence, RI  02912
>
> Tel: 401 863 1230
> Fax: 401 863 9182
> http://www.stat.brown.edu/zwu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>