[BioC] can't get summary.stat after "invariantset" normalizat ion

Man, Michael Michael.Man@pfizer.com
Fri, 18 Oct 2002 16:21:48 -0400


Thanks for the help.  The problem is still there.  Here are the results ...
-m

> tmp
Plob object
CDF used=HG_U95A.CDF
number of chips=8
number of genes=12626
number of probe pairs=201807
annotation=
description=
notes=normalized by invariant set
size of chip=640x640
chip names=
1532m99hpp_av04
1532n99hpp_av04
1532o99hpp_av04
1532p99hpp_av04
1532q99hpp_av04
1532r99hpp_av04
1532s99hpp_av04
1532t99hpp_av04r


> ##### patch suggested by Laurent Gautier 10/17/2002 #####
> bg.parameters  <- function (pm, mm, n.pts = 2^14)
+ {
+     max.density <- function(x, n.pts) {
  +         aux <- density(x, kernel = "epanechnikov", n = n.pts,
na.rm=TRUE)
+         aux$x[order(-aux$y)[1]]
  +     }
  +     mmbg <- max.density(mm, n.pts)
  +     pmbg <- max.density(pm, n.pts)
  +     bg.data <- mm[mm < mmbg]
  +     bg.data <- bg.data - mmbg
  +     bgsd <- sqrt(sum(bg.data^2)/(length(bg.data) - 1)) * sqrt(2)/0.85
  +     sig.data <- pm[pm > pmbg]
    +     sig.data <- sig.data - pmbg
   +     alpha <- 1/mean(sig.data)
   +     mubg <- mmbg
  +     list(alpha = alpha, mu = mubg, sigma = bgsd)
}
#+ }
## end > ##### end of the patch #####


> system.time(
+   latin.LiWong <- express(tmp, normalize=FALSE, summary.stat=li.wong) #
didn't work
+ )
Background correcting
Preparing Data
Computing expression. This may take a while.
Error in while (change > delta & iter < maxit) { : 
        missing value where logical needed
Timing stopped at: 31.31 0.62 31.94 0 0 


> system.time( latin.avdf <-
express(tmp,normalize=FALSE,bg=subtractmm,summary.stat=avdiff) )
Background correcting
Preparing Data
Computing expression. This may take a while.
Error in var(x, na.rm = na.rm) : missing observations in cov/cor
Timing stopped at: 13.78 0.55 14.33 0 0 


> latin.avdf <- express(tmp,normalize=F,bg=subtractmm,
summary.stat=function(x) apply(x,2,mean,trim=.2))
Background correcting
Preparing Data
Computing expression. This may take a while.
Error in psort(x, partial) : index 13 outside bounds
> traceback()
9: sort(x, partial = unique(c(lo, hi)))
8: mean.default(newX[, i], ...)
7: FUN(newX[, i], ...)
6: apply(x, 2, mean, trim = 0.2)
5: FUN(X[[1280]], ...)
4: lapply(as.list(X), FUN, ...)
3: sapply(data.lst, summary.stat, ...)
2: t(sapply(data.lst, summary.stat, ...))
1: express(tmp, normalize = F, bg = subtractmm, summary.stat = function(x)
apply(x, 
       2, mean, trim = 0.2))


> library(help=affy)
affy            Methods for Affymetrix Oligonucleotide Arrays

Description:

Package: affy
Title: Methods for Affymetrix Oligonucleotide Arrays
Version: 1.0
Author: Rafael A. Irizarry <rafa@jhu.edu>, Laurent Gautier
        <laurent@cbs.dtu.dk>, and Leslie M. Cope <cope@mts.jhu.edu>
Description: The package contains functions for exploratory
        oligonucleotide array analysis.
Maintainer: Rafael A. Irizarry <rafa@jhu.edu>
Depends: R (>= 1.5.0), modreg, Biobase, eda, tkWidgets
Bundle: Bioconductor
Date: 2002/5/1
BundleDescription: Contains essential Bioconductor packages.
URL: http://www.bioconductor.org/
License: LGPL
Built: R 1.6.0; i686-pc-linux-gnu; Wed Oct 16 17:03:17 EDT 2002



-----Original Message-----
From: Laurent Gautier [mailto:laurent@cbs.dtu.dk]
Sent: Thursday, October 17, 2002 5:34 PM
To: Man, Michael
Cc: 'Laurent Gautier'; 'bioconductor@stat.math.ethz.ch'
Subject: Re: [BioC] can't get summary.stat after "invariantset"
normalizat ion


Well... seems I've been boasting too much (and I was secretely hoping
it was the devel version...). I do not have run affy-release on 1.6.0,
I can only suggests reasons.

> On Thu, Oct 17, 2002 at 11:24:24AM -0400, Man, Michael wrote:
> > Any idea?  BTW, it works fine when I used unnormalized data.  -michael

This is the funny part. I wonder why it does 
> 
> 
> Several even... which version of R and of the package are you using ?
> 
> 
> L.
> 
> 
> > 
> > ###############################
> > > tmp <- normalize.Plob.invariantset(latin)  # use the first CHIP as
> > reference

The following do it too (preferred way):

tmp <- normalize(latin, method="invariantset")


> > > latin.LiWong <- express(tmp, normalize=F, bg=subtractmm,
> > summary.stat=li.wong)
> > Background correcting
> > Preparing Data
> > Computing expression. This may take a while.
> > Error in while (change > delta & iter < maxit) { : 
> >         missing value where logical needed

Would you have variables called 'T' or 'F' around ?
(the code relied (too much) on (T == TRUE) and (F == FALSE) but this
was fixed about a week ago I believe). 

> > In addition: Warning message: 
> > No convergence in inner loop after 50 in outerler tieration 4 
> >  in: fit.li.wong(t(data.matrix), remove.outliers, normal.array.quantile,

> > 

This only a warning.

> > 
> > > latin.LiWong <- express(tmp, normalize=F, summary.stat=li.wong)
> > Background correcting
> > Error in density(x, kernel = "epanechnikov", n = n.pts) : 
> >         x contains missing values
> >

In this case the default 'bg' is rma. You might get 'masked' probes from
your CEL files (to avoid taking them into consideration, try to set the
flags related to masked and outlying probes to FALSE (see man related
man page)). I thought it was fixed... but may be only in affy-devel.
Try to patch bg.parameters with the following:

bg.parameters <- function(pm, mm, n.pts=2^14){

  max.density <- function(x,n.pts){
    aux <- density(x, kernel="epanechnikov", n=n.pts, na.rm=TRUE) 
    aux$x[order(-aux$y)[1]] 
  }
    mmbg <- max.density(mm,n.pts)
    pmbg <- max.density(pm,n.pts)
		   
    bg.data <- mm[mm < mmbg]
    bg.data <- bg.data-mmbg
    bgsd <- sqrt(sum(bg.data^2)/(length(bg.data)-1))*sqrt(2)/.85
			  
    sig.data <- pm[pm > pmbg]
    sig.data <- sig.data-pmbg
			       
    alpha <- 1/mean(sig.data)
    mubg <- mmbg
    list(alpha=alpha,mu=mubg,sigma=bgsd)  
}
				     

> > 
> > > latin.avdf <-
express(tmp,normalize=F,bg=subtractmm,summary.stat=avdiff)
> > Background correcting
> > Preparing Data
> > Computing expression. This may take a while.
> > Error in var(x, na.rm = na.rm) : missing observations in cov/cor
> > 

A patch was contributed for that and I was convinced I applied it to
both the released and devel version (I cannot find track of it in CVS ?!
I am (much) confused now...)
The reason is like above: presence of NAs.

> > 
> > > latin.avdf <- express(tmp,normalize=F,bg=subtractmm,
> > +                       summary.stat=function(x)
apply(x,2,mean,trim=.2))
> > Background correcting
> > Preparing Data
> > Computing expression. This may take a while.
> > Error in psort(x, partial) : index 13 outside bounds


What does give 'traceback()' on this one ?




L.