[BioC] Discrepancy on results from gcrma function and justGCRMA

James W. MacDonald jmacdon at med.umich.edu
Thu Feb 4 14:56:38 CET 2010


Hi Jenny,

It wasn't posted to the list because Jin got tired of waiting for a 
response from the list and emailed me directly, so the correspondence 
was just between the two of us.

It was my oversight to keep the conversation off-list. Below is the 
transcript of our conversation, for the archives (and apologies for 
keeping this 'secret').

Hi Jin,

I just checked in a modification to justGCRMA that results in identical 
output as compared to gcrma. There have been quite a few modifications 
to gcrma since I originally wrote justGCRMA, and it is no longer 
possible for justGCRMA to do everything that one can do with gcrma.

Specifically, in the past gcrma used the MM probes on a chip to estimate 
background, but this has been changed so the default is to use internal 
estimates that come with the package. The functionality to use MM probes 
(or a subset thereof) still exists, but it is too much work to modify 
justGCRMA to use the MM probes, not to mention the headache of trying to 
maintain the code. So justGCRMA will only use the internal NSB estimates 
that come with the package, and will give identical results to gcrma as 
long as you use the internal NSB estimates for both functions.

The new version of gcrma is 2.18.1, and should be available for 
installation within the next couple of days.

Best,

Jim


 >>> >>> Jin <jinxing1986 at gmail.com> wrote:
 > > Hello Dr. MacDonald,
 > >
 > > I was wondering if you could help me with a problem. I've ran my
 > > dataset through GCRMA and justGCRMA but I've gotten different
 > > numerical results from both. I was under the impression that these two
 > > functions were identical except justGCRMA was more memory efficient
 > > because it did not have to create an AffyBatch object. I used the
 > > default parameters for both meaning I did not include any parameters
 > > at all when calling gcrma() and justgcrma().
 > >
 > > Are they suppose to give identical numerical results?
 > >
 > > For example, below are some sample results. The first row is GCRMA and
 > > the second row is justGCRMA. The columns are my 8 arrays. JustGCRMA
 > > values are consistently higher than GCRMA values whereas I would have
 > > expected similar numbers matching the top and bottom rows.
 > >
 > > 3.58	3.80	3.83	4.52	3.91	5.17	3.78	6.45
 > > 4.79	4.73	5.21	5.68	5.44	5.88	5.11	7.76
 > >
 > >
 > >
 > > Thanks,
 > > Jin
 > > University of California, San Diego


Jenny Drnevich wrote:
> Thanks Jim,
> 
> Thanks for the info. I found the question that was posted last last year 
> (and responded to it to start the thread again), but I still can't find 
> your response on BioC. Perhaps it accidently didn't get sent back to the 
> list?
> 
> :)
> Jenny
> 
> At 04:31 PM 2/3/2010, James W. MacDonald wrote:
>> Hi Jenny,
>>
>> Jenny Drnevich wrote:
>>> Hi everyone,
>>> I just spent several hours tracking down this problem, as I noticed 
>>> this same discrepancy between the results of justGCRMA() and gcrma() 
>>> called on the same data. There was also another post about this on 
>>> Nov 4, 2008, but I couldn't find where either of them had ever been 
>>> answered.
>>
>> I think I missed that one, but the question was asked again and 
>> answered by me late last year.
>>
>>> HOWEVER, it appears that the discrepancy has been fixed in R 2.10.1 
>>> (gcrma 2.18.1), but it was in R 2.10.0 (gcrma 2.18.0) (examples and 
>>> sessionInfos below).
>>> While I'm glad it has been "fixed", where would this have been 
>>> documented? I didn't think this type of change would have been 
>>> included in a minor version upgrade. What was the explanation for the 
>>> original discrepancy?
>>
>> The explanation is that justGCRMA() is a function that I wrote several 
>> years ago. In the intervening period the defaults for gcrma() were 
>> changed without making the corresponding changes to justGCRMA().
>>
>> I made the (unfounded) assumption that the maintainer of the gcrma 
>> package (Jean Wu) was keeping justGCRMA() consistent with gcrma(), 
>> which as you have seen is not true. Since this was an oversight by 
>> both me and Jean, it wasn't documented anywhere.
>>
>> I have since made the two functions consistent when running under the 
>> default arguments. However, there are some arguments for gcrma() that 
>> are not available for justGCRMA(), so they are not consistent in all 
>> cases.
>>
>> Best,
>>
>> Jim
>>
>>
>>> Thanks,
>>> Jenny
>>> #The examples below are not reproducible because you need .CEL files 
>>> - run them on any that you have.
>>>  > library(gcrma)
>>> Loading required package: affy
>>> Loading required package: Biobase
>>> Welcome to Bioconductor
>>>   Vignettes contain introductory material. To view, type
>>>   'openVignette()'. To cite Bioconductor, see
>>>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>  > setwd("K:/Bulla/CELfiles")
>>>  > raw <- ReadAffy()
>>>  > raw
>>> AffyBatch object
>>> size of arrays=834x834 features (10 kb)
>>> cdf=Rat230_2 (31099 affyids)
>>> number of samples=6
>>> number of genes=31099
>>> annotation=rat2302
>>> notes=
>>>  >
>>>  > gcrma.1 <- gcrma(raw)
>>> Adjusting for optical effect......Done.
>>> Computing affinitiesLoading required package: AnnotationDbi
>>> .Done.
>>> Adjusting for non-specific binding......Done.
>>> Normalizing
>>> Calculating Expression
>>>  >
>>>  > gcrma.2 <- justGCRMA()
>>> Computing affinities.Done.
>>> Adjusting for optical effect.......Done.
>>> Adjusting for non-specific binding......Done.
>>> Normalizing
>>> Calculating Expression
>>>  >
>>>  > all.equal(exprs(gcrma.1),exprs(gcrma.2))
>>> [1] "Mean relative difference: 0.03514035"
>>>  >
>>>  > sessionInfo()
>>> R version 2.10.0 (2009-10-26)
>>> i386-pc-mingw32
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
>>> States.1252
>>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> other attached packages:
>>> [1] rat2302probe_2.5.0  AnnotationDbi_1.8.1 rat2302cdf_2.5.0
>>> gcrma_2.18.0
>>> [5] affy_1.24.2         Biobase_2.6.0
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.14.0        Biostrings_2.14.3 DBI_0.2-4
>>> IRanges_1.4.4
>>> [5] preprocessCore_1.8.0 RSQLite_0.7-3        splines_2.10.0
>>> tools_2.10.0
>>>  >
>>> #Now switch R versions but run same code....
>>>  > library(gcrma)
>>> Loading required package: affy
>>> Loading required package: Biobase
>>> Welcome to Bioconductor
>>>   Vignettes contain introductory material. To view, type
>>>   'openVignette()'. To cite Bioconductor, see
>>>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>  > setwd("K:/Bulla/CELfiles")
>>>  > raw <- ReadAffy()
>>>  > raw
>>> AffyBatch object
>>> size of arrays=834x834 features (10 kb)
>>> cdf=Rat230_2 (31099 affyids)
>>> number of samples=6
>>> number of genes=31099
>>> annotation=rat2302
>>> notes=
>>>  >
>>>  > gcrma.1 <- gcrma(raw)
>>> Adjusting for optical effect......Done.
>>> Computing affinitiesLoading required package: AnnotationDbi
>>> .Done.
>>> Adjusting for non-specific binding......Done.
>>> Normalizing
>>> Calculating Expression
>>>  >
>>>  > gcrma.2 <- justGCRMA()
>>> Computing affinities.Done.
>>> Adjusting for optical effect.......Done.
>>> Adjusting for non-specific binding......Done.
>>> Normalizing
>>> Calculating Expression
>>>  >
>>>  > all.equal(exprs(gcrma.1),exprs(gcrma.2))
>>> [1] TRUE
>>>  >
>>>  > sessionInfo()
>>> R version 2.10.1 (2009-12-14)
>>> i386-pc-mingw32
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
>>> States.1252
>>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> other attached packages:
>>> [1] rat2302probe_2.5.0  AnnotationDbi_1.8.1 rat2302cdf_2.5.0
>>> gcrma_2.18.1
>>> [5] affy_1.24.2         Biobase_2.6.1
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.14.0        Biostrings_2.14.10 DBI_0.2-5
>>> IRanges_1.4.9
>>> [5] preprocessCore_1.8.0 RSQLite_0.8-0        splines_2.10.1
>>> tools_2.10.1
>>>  >
>>>
>>>
>>> At 04:31 PM 10/28/2009, Jin wrote:
>>>> Jerry <norn2k at ...> writes: > > Hi, > Â  > I'm currently using gcrma 
>>>> package in R to summarize probeset intensities from CEL files. 
>>>> Surprisingly, > IÂ found that the results generated from gcrma 
>>>> function and justGCRMA function are quite different. In > general, 
>>>> expression values from gcrma function are lower and the boxplots 
>>>> from gcrma are quite > asymmetric (no tails on the bottom). I 
>>>> attached some plots below for your information. This confused > me 
>>>> as I thought that the two functions implemented similar 
>>>> algorithms.  >   > Thank you so much for your help! >   > Best, > 
>>>> Jianjun > > PS: The package I used is gcrma 2.12.1. I also observed 
>>>> similar results on 2.14 version. > >       Hello, I'm running into 
>>>> the same problem with gcrma and justGCRMA. They are not giving the 
>>>> same numeric results. Is this an intended feature? I was under the 
>>>> impression that these two methods were identical with the exception 
>>>> that justGCRMA was more memory efficient because it didn't have to 
>>>> create an AffyBatch object. Can anyone shed some light on this 
>>>> subject? Thanks, Jin _______________________________________________ 
>>>>  Bioconductor mailing list Bioconductor at stat.math.ethz.ch 
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the 
>>>> archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> Jenny Drnevich, Ph.D.
>>> Functional Genomics Bioinformatics Specialist
>>> W.M. Keck Center for Comparative and Functional Genomics
>>> Roy J. Carver Biotechnology Center
>>> University of Illinois, Urbana-Champaign
>>> 330 ERML
>>> 1201 W. Gregory Dr.
>>> Urbana, IL 61801
>>> USA
>>> ph: 217-244-7355
>>> fax: 217-265-5066
>>> e-mail: drnevich at illinois.edu
>>
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should 
>> not be used for urgent or sensitive issues
> 
> Jenny Drnevich, Ph.D.
> 
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
> 
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
> 
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at illinois.edu

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list