[BioC] affyPLM and exon array question

Wed May 2 02:50:24 CEST 2007

The slowdown you are observing is due to just a few probesets on the
array. These probesets contain many 1000's of probes. In the current
implementation when you use the command that you specified (fitting the
default model) fitPLM uses a procedure optimized for probesets with
relatively few probes across many arrays and so is pretty quick most of
the time (my experience is that is is not completely unacceptable even
up to about 1000 probes across a large number of arrays, at least on my
machine).

eg both of the following contain same number of datapoints

Case I: 11 probes and 1000 arrays
Case II: 1000 probes and 11 probes

but case I will be a lot quicker than case II in the current
implementation. 

Demonstration code

> library(affyPLM)

### note to any developers out there, the following is UNSUPPORTED
### and subject to change. DO NOT USE.
> rlm.default.rma.model <- function(y,PsiCode=0,PsiK=1.345){
+   .Call("R_rlm_rma_default_model",y,PsiCode,PsiK,PACKAGE="affyPLM")
+ }

#Case I
> y <- matrix(rnorm(11*1000),11,1000)
> system.time(test <- rlm.default.rma.model(y))
[1] 0.735 0.032 0.788 0.000 0.000

#Case II
> y <- matrix(rnorm(11*1000),1000,11)
> system.time(test <- rlm.default.rma.model(y))
[1] 19.776  0.508 21.730  0.000  0.000

As for workarounds, I am pretty sure that these extremely large
probesets are control probesets of some kind that could be safely
ignored and it is possible to pass a vector of probeset names specifying
a subset to use for fitPLM.

Best,

Ben

On Tue, 2007-05-01 at 12:36 -0700, Allen Day wrote:
> I suspect so, although I haven't tried running rma() directly.
> Just.rma() works fine, and fitPLM is able to RMA normalize internally.
> 
> I was able to move this a little further along by patching the mm()
> function to return empty list in the case of a dimensionless pset
> variable.  Apparently it is usually a two-column matrix with pm in
> psets[,1] and mm in psets[,2].  Heres the patch.
> http://paste.turbogears.org/paste/1253/plain
> 
> This allows me to successfully background correct and normalize with
> RMA through wrapper function fitPLM from the affyPLM library.  It's
> taking forever though, even running with minimal options.  Here's my
> call:
> 
> fitPLM(ab, output.param=list(residuals=FALSE,weights=FALSE,resid.SE=FALSE),verbosity.level=10);
> 
> Any advice?
> 
> -Allen
> 
> On 5/1/07, Crispin Miller <CMiller at picr.man.ac.uk> wrote:
> > Hi Allen,
> > Does rma() work with your cdf?
> >
> > We've also produced one that works OK with rma() (see the 'exonmap'
> > package vignette for more details, including how to get it).  Don't know
> > if that helps?
> >
> > Crispin
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: bioconductor-bounces at stat.math.ethz.ch
> > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Allen Day
> > > Sent: 01 May 2007 01:32
> > > To: bioconductor at stat.math.ethz.ch
> > > Subject: [BioC] affyPLM and exon array question
> > >
> > > Hi,
> > >
> > > I've been trying to get NUSE, RLE, and RMA values for
> > > HuEx-1_0-st-v2 (Human "all exon") Affymetrix arrays.
> > >
> > > So far I have successfully read the arrays into an affybatch object.
> > > This required creating the CDF environment, which I have
> > > already done with makecdfenv.  I'll be submitting that for
> > > inclusion shortly, but that's another topic.
> > >
> > > After creating the AffyBatch, I try to use affyPLM to do an
> > > RMA model fit.  R = 2.4.1, affyPLM = 1.12.0, affy = 1.12.2.
> > > That's where there's trouble, and it appears to be caused by
> > > the lack of mismatch probes on the array.  Here's code
> > > illustrating the problem:
> > >
> > > > library( 'affy' );
> > > > library( 'affyPLM' );
> > > > ab = read.affybatch(
> > > filenames='/home/allenday/cel/0001.CEL' ); ab; #
> > > > works, output omitted pm( ab ); # works, output omitted mm( ab ); #
> > > > fails!
> > > Error in FUN(X[[1411190]], ...) : subscript out of bounds
> > > > plm = fitPLM( ab ); #same failure in fitPLM, caused by a
> > > call to mm()
> > > > on variable ab;
> > > Error in FUN(X[[1411190]], ...) : subscript out of bounds
> > >
> > > I'm only proficient enough in R and C to track this down --
> > > I'm don't know R or Bioconductor well enough to know how to
> > > fix it.  If I can get this going I will submit a new package
> > > that provides just.nuse() and just.rle() functions.  Can
> > > someone give me a pointer for how to make this work?
> > >
> > > Thanks.
> > >
> > > -Allen
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> >
> > --------------------------------------------------------
> >
> >
> > This email is confidential and intended solely for the use of the person(s) ('the intended recipient') to whom it was addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Paterson Institute for Cancer Research or the University of Manchester. It may contain information that is privileged & confidential within the meaning of applicable law. Accordingly any dissemination, distribution, copying, or other use of this message, or any of its contents, by any person other than the intended recipient may constitute a breach of civil or criminal law and is strictly prohibited. If you are NOT the intended recipient please contact the sender and dispose of this e-mail as soon as possible.
> >
> >
--