[BioC] GCRMA: feature request

Tue Jan 18 21:39:39 CET 2011

Dear Guido,

just one word of caution, even when you apply transposed median polish 
there might be minute amounts of correlation left in small sample sizes 
when you use gcRMA background correction. gcRMA was what actually 
triggered the studies of Lim et 
al.,(http://bioinformatics.oxfordjournals.org/content/23/13/i282.long) 
which we then extended. (Have a look into Additional file 7 for the 
permutations of bg/norm/sum in 
http://www.biomedcentral.com/1471-2105/11/553)

But having the option to choose would definitely be great.

Best Wishes,
Björn

Hooiveld, Guido wrote:
> Dear Jean,
>
> Please allow me to put forward a feature request for the GCRMA package:
> as you may have noticed in literature and on the BioC mailing list, there has been discussion on the use of the median polish algorithm (in RMA) for summerizing signals of probes into a single probeset value in relation to correlation artefacts. See e.g.:
> http://www.biomedcentral.com/1471-2105/11/553
> and
> http://thread.gmane.org/gmane.science.biology.informatics.conductor/32255/focus=32259
>
> Moreover, in the above-mentioned BioC thread it is advocated to use a robust regression M-estimation procedure, e.g. available in the package 'affyPLM' / 'preprocessCore', instead of applying median polish on the transposed data matrix (aka tRMA), as was suggested by the authors of before-mentioned paper.
>
> In the intro of the fRMA paper it is also stated that more statistically rigorous procedures such as M-estimation techniques could be used for summerization, and this is one of the reasons fRMA by default uses AffyPLM's default M-estimator (Huber) for summerization instead of median polish.
> http://dx.doi.org/10.1093/biostatistics/kxp059
>
> Since AFAIK GCRMA is equal to RMA, except of course for the background correction, I wondered whether it would be possible to build in GCRMA the option to give a user the possibility to select a robust M-estimation procedure (e.g. affyPLM's default one) over the (GCRMA's default) median polish algorithm to summerize the probe data into a probeset values. Thus something like:
> x.norm <- gcrma(affy.data, sum="median.polish") [default] or x.norm <- gcrma(affy.data, sum="affyPLM").
>
> I would appreciate your opinion on this.
>
> Regards,
> Guido
>
>
> In addition, i would like to remind you about another issue with GCRMA my collegue Philip brought forward last December, which you may have missed (i copied his email below):
> -----------------------------------------------------------------------------
> I noticed the following problem when using gcRMA. the gcRMA-library tries to install probe packages. This is fine, except in cases when a probe-package is already available (and local versions vs repository versions do not necessarily match)! This behaviour is triggered within the function getProbePackage:
>
> function (probepackage, lib = .libPaths()[1], verbose = TRUE)
> {
>     options(show.error.messages = FALSE)
>     attempt <- try(do.call(library, list(probepackage, lib.loc = lib)))
>     options(show.error.messages = TRUE)
>
> ...
> }
>
> .libPaths() is in this particular example:
>   
>> .libPaths()
>>     
> [1] "/local/home/guidoh/R/x86_64-unknown-linux-gnu-library/2.12"
> [2] "/geninf/prog64/R/R-2.12.0/lib64/R/library"
>
> As you can see, .libPaths()[1] point the the local R directory of the user, whereas the R installation directory is in .libPaths()[2]. Hence, we have the complication that gcRMA installs (the wrong) probe libraries (from BioC) that are already available to the user! The issue with this is that in some cases we use custom, tailored libraries that are not identical to those in the repositories. Hence, we may run into unexpected problems! As a matter of fact, I prefer to simply disable the ability (in gcRMA) to automatically install probe packages in the first place (just an option that is enabled by default, but can be disabled by the user)!
>
> Anyway, there is no reason to limit yourself to the first library. As an example, the following command will work without any problem:
>   
>> attempt <- try(do.call(library, list("nugohs1a520180hsentrezgcdf", lib.loc = .libPaths())))
>> attempt
>>     
>  [1] "gcrma"                      "nugohs1a520180hsentrezgcdf"
>  [3] "affy"                       "Biobase"
>  [5] "stats"                      "graphics"
>  [7] "grDevices"                  "utils"
>  [9] "datasets"                   "methods"
> [11] "base"
>
> At least your function will really go through all R library directories to search whether or not a library is installed!
> So I kindly ask for the following modifications:
> 1. An option in gcRMA to simply disable the automated installation of missing libraries [I need to control what happens! :)]
> 2. To simply use .libPaths() instead of .libPaths()[1] to really search through all R installation directories.
>
> Please let me know whether or not you agree. Doing these 2 modifications are not very hard, so I can contribute it to you if you are interested.
>
> Regards,
> Philip
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> .
>
>   

-- 
---------------------------------------------
Björn Usadel
Max Planck Institute for Molecular Plant Physiology
Am Mühlenberg 1
14476 Potsdam, Germany
Tel:0331 5678153
www.tinyurl.com/IntegrativeCarbonBiology
www.gabipd.org