[BioC] Bioconductor Digest, Vol 119, Issue 1

Thu Jan 3 11:01:20 CET 2013

Dear Leo,
          Nice to hear from you.
I guess our approaches are complementary rather than competitive.
The code in http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/gp-code/R/TCBB-2007-11-0161_noCEL.tar
does not try and replace RMA but provides a means of locating
dodgy areas on microarrays. Spatial errors. Ie less trustworthy probe values.

Also robust averages (ie with outlier removal) of tens of thousands of cel
files can be calculated on a fairly standard PC using R by not trying to keep
them all simultaneously in R data structures. We have used the resulting
"average array" to quantile normalise single genechips (or 10000+ arrays).

Much of our subsequent analysis has been using individual probes
(see http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/ )
but the original idea was to feed the normalised probe data into RMA
and similiar probeset based algorithms.

Bill

On 1/3/13, Leo Lahti <leo.lahti at iki.fi> wrote:
> Dear Bill - thanks for the interesting work related to scalable
> microarray preprocessing.
>
> We have recently submitted a closely related manuscript for review.
> Similar to your work, the proposed Online-RPA algorithm reads CEL
> files in batches to update the hyperparameters of a probabilistic
> probe-level model. This yields a fully scalable algorithm (linear time
> wrt. sample size) which systematically outperforms the standard RMA in
> various benchmarking tests, is readily applicable to all Affymetrix
> and other short oligo arrays (in contrast to fRMA), and has been used
> to preprocess data sets with tens of thousands of arrays.  The
> preprint is available through arXiv (arxiv.org/abs/1212.5932 "A fully
> scalable online-preprocessing algorithm for short oligonucleotide
> microarray atlases.").
>
> The implementation (function rpa.online) is available through
> Bioconductor RPA package:
> http://www.bioconductor.org/packages/devel/bioc/html/RPA.html
>
> Would be interesting to compare the two approaches experimentally.
>
> best regards,
> Leo Lahti, Finland
> http://www.iki.fi/Leo.Lahti
>
>> Date: Mon, 31 Dec 2012 11:20:18 -0800 (PST)
>> From: "wlangdon [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, w.langdon at cs.ucl.ac.uk
>> Cc: affy Maintainer <rafa at jhu.edu>
>> Subject: [BioC] normalise many cel files TCBB-2007-11-0161_noCEL.tar
>> Message-ID: <20121231192018.BDD99133105 at mamba.fhcrc.org>
>>
>>
>> Today I wrote to Rafael Irizarry about this and he suggested I post my
>> message here.
>>
>> Some time back I wrote some R code to normalise from
>> one to several tens of thousand Affymetrix cel files on a
>> (Linux) PC.
>>
>> The advantage is that it does not keep all cel files in memory
>> all the time and so the usual memory limits which restrict the
>> number of cel files do not apply.
>> http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/gp-code/R/TCBB-2007-11-0161_noCEL.tar
>>
>> The R-code also reports spatial defects:
>> A Survey of Spatial Defects in Homo Sapiens Affymetrix GeneChips,
>> W. B. Langdon and G. J. G. Upton and R. da Silva Camargo and A. P.
>> Harrison, IEEE/ACM Transactions on Computational Biology and
>> Bioinformatics, 7(4) 647-653 oct-dec 2010. PubMed 21030732
>>
>> If we could incorporate this into your bioconductor affy package
>> that would be great.
>>
>> Bill
>>
>>         Dr. W. B. Langdon,
>>         Department of Computer Science,
>>         University College London
>>         Gower Street, London WC1E 6BT, UK
>>         http://www.cs.ucl.ac.uk/staff/W.Langdon/
>