[BioC] can't get summary.stat after "invariantset" normalizat ion

Rafael A. Irizarry rafa@jhu.edu
Mon, 21 Oct 2002 10:02:13 -0400 (EDT)


i have not looked very closely at the invarianteset code but here are
two things to keep in mind:

1- our implementation of li.wong is not that stable. 
   its our best guess at dchip from the papers. its not clear from the 
   papers what to do when one has  many outliers. 
2- in their papers li and wong recommend that one should have at least 10 
   arrays to use their method.

rafael

On Mon, 21 Oct 2002, Man, Michael wrote:

> I got the same result when I tried the "Dilution" data.   -m
> 
> > tmp <- normalize.Plob.invariantset(Dilution)  # use the first CHIP as
> reference
> > system.time(
> +   latin.LiWong <- express(tmp, normalize=FALSE, summary.stat=li.wong) #
> didn't work
> + )
> Background correcting
> Preparing Data
> Computing expression. This may take a while.
> Error in while (change > delta & iter < maxit) { : 
>         missing value where logical needed
> Timing stopped at: 2.88 0.38 3.26 0 0 
> 
> > system.time( latin.avdf <-
> express(tmp,normalize=FALSE,bg=subtractmm,summary.stat=avdiff) )
> latiBackground correcting
> n.avdf <- express(tmp,normalize=F,bg=subtractmm,
>                       summary.stat=function(x) apply(x,2,mean,trim=.2))
> 
> Preparing Data
> Computing expression. This may take a while.
> Error in var(x, na.rm = na.rm) : missing observations in cov/cor
> Timing stopped at: 0.09 0 0.09 0 0 
> 
> > latin.avdf <- express(tmp,normalize=F,bg=subtractmm,
> +                       summary.stat=function(x) apply(x,2,mean,trim=.2))
> Background correcting
> Preparing Data
> Computing expression. This may take a while.
> Error in psort(x, partial) : index 13 outside bounds
> 
> 
> -----Original Message-----
> From: Laurent Gautier [mailto:laurent@cbs.dtu.dk]
> Sent: Sunday, October 20, 2002 1:36 PM
> To: Man, Michael
> Cc: 'Laurent Gautier'; 'bioconductor@stat.math.ethz.ch'
> Subject: Re: [BioC] can't get summary.stat after "invariantset"
> normalizat ion
> 
> 
> 
> Without having your particular dataset, one can only conjecture.
> 
> 
> L.
> 
> 
> 
> On Fri, Oct 18, 2002 at 04:21:48PM -0400, Man, Michael wrote:
> > Thanks for the help.  The problem is still there.  Here are the results
> ...
> > -m
> > 
> > > tmp
> > Plob object
> > CDF used=HG_U95A.CDF
> > number of chips=8
> > number of genes=12626
> > number of probe pairs=201807
> > annotation=
> > description=
> > notes=normalized by invariant set
> > size of chip=640x640
> > chip names=
> > 1532m99hpp_av04
> > 1532n99hpp_av04
> > 1532o99hpp_av04
> > 1532p99hpp_av04
> > 1532q99hpp_av04
> > 1532r99hpp_av04
> > 1532s99hpp_av04
> > 1532t99hpp_av04r
> > 
> > 
> > > ##### patch suggested by Laurent Gautier 10/17/2002 #####
> > > bg.parameters  <- function (pm, mm, n.pts = 2^14)
> > + {
> > +     max.density <- function(x, n.pts) {
> >   +         aux <- density(x, kernel = "epanechnikov", n = n.pts,
> > na.rm=TRUE)
> > +         aux$x[order(-aux$y)[1]]
> >   +     }
> >   +     mmbg <- max.density(mm, n.pts)
> >   +     pmbg <- max.density(pm, n.pts)
> >   +     bg.data <- mm[mm < mmbg]
> >   +     bg.data <- bg.data - mmbg
> >   +     bgsd <- sqrt(sum(bg.data^2)/(length(bg.data) - 1)) * sqrt(2)/0.85
> >   +     sig.data <- pm[pm > pmbg]
> >     +     sig.data <- sig.data - pmbg
> >    +     alpha <- 1/mean(sig.data)
> >    +     mubg <- mmbg
> >   +     list(alpha = alpha, mu = mubg, sigma = bgsd)
> > }
> > #+ }
> > ## end > ##### end of the patch #####
> > 
> > 
> > > system.time(
> > +   latin.LiWong <- express(tmp, normalize=FALSE, summary.stat=li.wong) #
> > didn't work
> > + )
> > Background correcting
> > Preparing Data
> > Computing expression. This may take a while.
> > Error in while (change > delta & iter < maxit) { : 
> >         missing value where logical needed
> > Timing stopped at: 31.31 0.62 31.94 0 0 
> > 
> > 
> > > system.time( latin.avdf <-
> > express(tmp,normalize=FALSE,bg=subtractmm,summary.stat=avdiff) )
> > Background correcting
> > Preparing Data
> > Computing expression. This may take a while.
> > Error in var(x, na.rm = na.rm) : missing observations in cov/cor
> > Timing stopped at: 13.78 0.55 14.33 0 0 
> > 
> > 
> > > latin.avdf <- express(tmp,normalize=F,bg=subtractmm,
> > summary.stat=function(x) apply(x,2,mean,trim=.2))
> > Background correcting
> > Preparing Data
> > Computing expression. This may take a while.
> > Error in psort(x, partial) : index 13 outside bounds
> > > traceback()
> > 9: sort(x, partial = unique(c(lo, hi)))
> > 8: mean.default(newX[, i], ...)
> > 7: FUN(newX[, i], ...)
> > 6: apply(x, 2, mean, trim = 0.2)
> > 5: FUN(X[[1280]], ...)
> > 4: lapply(as.list(X), FUN, ...)
> > 3: sapply(data.lst, summary.stat, ...)
> > 2: t(sapply(data.lst, summary.stat, ...))
> > 1: express(tmp, normalize = F, bg = subtractmm, summary.stat = function(x)
> > apply(x, 
> >        2, mean, trim = 0.2))
> > 
> > 
> > > library(help=affy)
> > affy            Methods for Affymetrix Oligonucleotide Arrays
> > 
> > Description:
> > 
> > Package: affy
> > Title: Methods for Affymetrix Oligonucleotide Arrays
> > Version: 1.0
> > Author: Rafael A. Irizarry <rafa@jhu.edu>, Laurent Gautier
> >         <laurent@cbs.dtu.dk>, and Leslie M. Cope <cope@mts.jhu.edu>
> > Description: The package contains functions for exploratory
> >         oligonucleotide array analysis.
> > Maintainer: Rafael A. Irizarry <rafa@jhu.edu>
> > Depends: R (>= 1.5.0), modreg, Biobase, eda, tkWidgets
> > Bundle: Bioconductor
> > Date: 2002/5/1
> > BundleDescription: Contains essential Bioconductor packages.
> > URL: http://www.bioconductor.org/
> > License: LGPL
> > Built: R 1.6.0; i686-pc-linux-gnu; Wed Oct 16 17:03:17 EDT 2002
> > 
> > 
> > 
> > -----Original Message-----
> > From: Laurent Gautier [mailto:laurent@cbs.dtu.dk]
> > Sent: Thursday, October 17, 2002 5:34 PM
> > To: Man, Michael
> > Cc: 'Laurent Gautier'; 'bioconductor@stat.math.ethz.ch'
> > Subject: Re: [BioC] can't get summary.stat after "invariantset"
> > normalizat ion
> > 
> > 
> > Well... seems I've been boasting too much (and I was secretely hoping
> > it was the devel version...). I do not have run affy-release on 1.6.0,
> > I can only suggests reasons.
> > 
> > > On Thu, Oct 17, 2002 at 11:24:24AM -0400, Man, Michael wrote:
> > > > Any idea?  BTW, it works fine when I used unnormalized data.  -michael
> > 
> > This is the funny part. I wonder why it does 
> > > 
> > > 
> > > Several even... which version of R and of the package are you using ?
> > > 
> > > 
> > > L.
> > > 
> > > 
> > > > 
> > > > ###############################
> > > > > tmp <- normalize.Plob.invariantset(latin)  # use the first CHIP as
> > > > reference
> > 
> > The following do it too (preferred way):
> > 
> > tmp <- normalize(latin, method="invariantset")
> > 
> > 
> > > > > latin.LiWong <- express(tmp, normalize=F, bg=subtractmm,
> > > > summary.stat=li.wong)
> > > > Background correcting
> > > > Preparing Data
> > > > Computing expression. This may take a while.
> > > > Error in while (change > delta & iter < maxit) { : 
> > > >         missing value where logical needed
> > 
> > Would you have variables called 'T' or 'F' around ?
> > (the code relied (too much) on (T == TRUE) and (F == FALSE) but this
> > was fixed about a week ago I believe). 
> > 
> > > > In addition: Warning message: 
> > > > No convergence in inner loop after 50 in outerler tieration 4 
> > > >  in: fit.li.wong(t(data.matrix), remove.outliers,
> normal.array.quantile,
> > 
> > > > 
> > 
> > This only a warning.
> > 
> > > > 
> > > > > latin.LiWong <- express(tmp, normalize=F, summary.stat=li.wong)
> > > > Background correcting
> > > > Error in density(x, kernel = "epanechnikov", n = n.pts) : 
> > > >         x contains missing values
> > > >
> > 
> > In this case the default 'bg' is rma. You might get 'masked' probes from
> > your CEL files (to avoid taking them into consideration, try to set the
> > flags related to masked and outlying probes to FALSE (see man related
> > man page)). I thought it was fixed... but may be only in affy-devel.
> > Try to patch bg.parameters with the following:
> > 
> > bg.parameters <- function(pm, mm, n.pts=2^14){
> > 
> >   max.density <- function(x,n.pts){
> >     aux <- density(x, kernel="epanechnikov", n=n.pts, na.rm=TRUE) 
> >     aux$x[order(-aux$y)[1]] 
> >   }
> >     mmbg <- max.density(mm,n.pts)
> >     pmbg <- max.density(pm,n.pts)
> > 		   
> >     bg.data <- mm[mm < mmbg]
> >     bg.data <- bg.data-mmbg
> >     bgsd <- sqrt(sum(bg.data^2)/(length(bg.data)-1))*sqrt(2)/.85
> > 			  
> >     sig.data <- pm[pm > pmbg]
> >     sig.data <- sig.data-pmbg
> > 			       
> >     alpha <- 1/mean(sig.data)
> >     mubg <- mmbg
> >     list(alpha=alpha,mu=mubg,sigma=bgsd)  
> > }
> > 				     
> > 
> > > > 
> > > > > latin.avdf <-
> > express(tmp,normalize=F,bg=subtractmm,summary.stat=avdiff)
> > > > Background correcting
> > > > Preparing Data
> > > > Computing expression. This may take a while.
> > > > Error in var(x, na.rm = na.rm) : missing observations in cov/cor
> > > > 
> > 
> > A patch was contributed for that and I was convinced I applied it to
> > both the released and devel version (I cannot find track of it in CVS ?!
> > I am (much) confused now...)
> > The reason is like above: presence of NAs.
> > 
> > > > 
> > > > > latin.avdf <- express(tmp,normalize=F,bg=subtractmm,
> > > > +                       summary.stat=function(x)
> > apply(x,2,mean,trim=.2))
> > > > Background correcting
> > > > Preparing Data
> > > > Computing expression. This may take a while.
> > > > Error in psort(x, partial) : index 13 outside bounds
> > 
> > 
> > What does give 'traceback()' on this one ?
> > 
> > 
> > 
> > 
> > L.
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>