[BioC] Re: NA's

Robert Gentleman rgentlem@jimmy.harvard.edu
Wed, 27 Mar 2002 19:59:03 -0500


On Thu, Mar 28, 2002 at 11:41:28AM +1100, Gordon Smyth wrote:
> Dear Jean and Sandrine,
> 
> I've never liked the idea of setting log-ratios to NA when one of the 
> foregrounds is less than the corresponding background, because this throws 
> away the information that that channel is very low for that spot.
> 
> One long-term solution would be to treat such values as left censored, 
> i.e., to mark them as being "below threshold of measurement". All 
> subsequent analysis of the values would have to accept a censored data format.
> 
> A shorter term solution would be to subtract only a fraction of the 
> background estimates, to keep the background corrected measurements all 
> positive. I am hoping to get a chance to look into ways of doing this 
> without being too ad hoc.
> 
> I haven't seen anything useful in the literature on this topic yet.
> 
> Best wishes
> Gordon

  Those are good points (I've moved this to bioconductor as well).
  I believe that one might want to do that with values that are
  positive in some cases as well. For example with oligo arrays (and I
  expect cDNA) most people don't seem to believe that low values are
  reliable, hence the left censoring (or Windsorising) point could be
  set to 20, or 50 (notice that there are no units attached).

  One of the main problems with this approach is that the analytic
  tools used to filter genes/ESTs must then account for the censoring
  and I believe that there are few such devices around. 

  However, I agree that this may be a more attractive solution in the
  long term.

   r


> 
> At 12:02 PM 27/03/2002 -0800, Yee Hwa Yang wrote:
> >Hi All,
> >
> >Sandrine and I are working on some cDNA data where we find there are lot's
> >of negative values which in turn produce NA's after log transform.  These
> >negative values arise because foreground intensities are smaller than the
> >background intensities (from image analysis output).
> >
> >For sma, we had created a series of functions (log.na, sum.na, mean.na,
> >...) to handle NA values. For example, we have
> >
> >  log.na
> >function (x, ...)
> >{
> >     log(ifelse(x > 0, x, NA), ...)
> >}
> >
> >Does anyone have any suggestions about dealing with NA issues in
> >general for cDNA array data?
> >
> >Thank you,
> >Jean & Sandrine
> 
> ---------------------------------------------------------------------------------------
> Dr Gordon K Smyth, Senior Research Scientist, Bioinformatics,
> Walter and Eliza Hall Institute of Medical Research,
> Post Office, Royal Melbourne Hospital, Vic 3050
> Tel: (03) 9345 2326, Fax (03) 9347 0852,
> Email: smyth@wehi.edu.au, www: http://www.statsci.org
> 
> _______________________________________________
> Biocore mailing list
> Biocore@stat.math.ethz.ch
> http://www.stat.math.ethz.ch/mailman/listinfo/biocore

-- 
+---------------------------------------------------------------------------+
| Robert Gentleman                 phone : (617) 632-5250                   |
| Associate Professor              fax:   (617)  632-2444                   |
| Department of Biostatistics      office: M1B28
| Harvard School of Public Health  email: rgentlem@jimmy.dfci.harvard.edu   |
+---------------------------------------------------------------------------+