[BioC] {Spam?} Re: RMA on a batch with NA values

Henrik Bengtsson hb at stat.berkeley.edu
Wed Aug 13 01:48:14 CEST 2008


Hi.

On Mon, Aug 11, 2008 at 8:00 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
> On Aug 11, 2008, at 9:50 AM, Tarca, Adi wrote:
>
>> Thanks Jim,
>>
>> It looks like rma has an argument "subset" that allows me to choose the
>> probesets that I want to consider.
>> Even discarding some probes out of the considered probesets can work as
>> well if no background correction is required, i.e.
>>
>> esetb<-rma(abatch,subset=psok, background=FALSE)
>>
>> works fine after inserting Nas.
>>
>> I just find strange that the background correction step requires that
>> all probe intensities should be valid number and not contain any Nas.
>
> Why? You will never have NA's in CEL files.

FYI, I'm not sure if you followed this thread from start, but Adi
wants to filter out some probes and tried to set them as NAs, which I
think is a natural approach.

Indepent of CEL files: You can have missing values in your probe
signals because of earlier pre-processing steps.  For instance, if a
set of pre-processing methods first translates data such that
non-positive values (including zero; non impossible) are produced and
then takes the logarithm (not uncommon), you will end up with a NaN
value, even after you unlog the data.  It may even occurs because the
image analysis software decides to call a probe missing.

About CEL files: To the best of my understanding, if the Affymetrix
image analysis software call a probe missing, then it flags those by
listing them in the 'Outlier' [field contain (x,y) entries] of the CEL
file.  I'm not sure what value they sets for (intensity, stddvs,
pixel), e.g. they might stores zeros or the original estimates that
was discarded as missing values.  I don't think there is an official
Affymetrix specification what values are allowed in these fields,
except that they have to be (float, float, short).  In other words,
they/we are probably allowed to store float (IEEE) NAs there as well
as negative values.

In aroma.affymetrix we store pre-processed probe-level data in CEL
files, and I know that NAs can be both stored and retrieved without
coercion as NAs using the Fusion SDK (via our 'affxparser' package).

To summarize, I don't think we should assume that probe-level data,
raw or processed, contains only non-negative values, but also negative
and missing values.  In many cases it is easy to adjust the estimator
algorithm to ignore missing values.

Finally, a FYI for this thread: The algorithm(s) fitting the
log-additive model in 'preprocessCore' package takes weights as well
(since last year or so).  With these weights you can give zero weight
no data points with missing values.  This can be done for a particular
intensity value on one array (e.g. if  NAs are introduced by by a
preprocessing method), or across all values for a particular probe
(e.g. filtering out a probe with poor properties).  Quite useful.

Cheers

Henrik

>
> Kasper
>
>
>> Regards,
>> Adi
>>
>>
>>
>> Adi Laurentiu TARCA, Ph.D.
>> Research Associate, Dept. of Computer Science &
>> Services in Support of the NIH Perinatology Research Branch,
>> Manager of the Bioinformatics Core, Karmanos Cancer Institute,
>> Wayne State University,
>> 3990 John R., Office 4809,
>> Detroit, Michigan 48201
>> Tel: 1-313-5775305
>> Cell: 1-313-4043116
>> http://vortex.cs.wayne.edu/tarca/
>>
>> -----Original Message-----
>> From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
>> Sent: Monday, August 11, 2008 9:05 AM
>> To: Tarca, Adi
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: {Spam?} Re: [BioC] RMA on a batch with NA values
>>
>> Hi Adi,
>>
>> Tarca, Adi wrote:
>>>
>>> Hi all,
>>> I am trying to run RMA using only a limited number of probes. I have
>>> the x and y coordinates of those probes that I want to use.
>>> The approach I have tried was to replace with NA all probe intensities
>>
>>> that I want to discard, but the RGUI gives an error and tries to end
>>> the session. The console is still running but it does not return any
>>> results.
>>> Here is the code I use:
>>>
>>> #####
>>> abatch<-ReadAffy(filenames=fnss,celfile.path="./CELL")
>>> indok<-xy2indices(x=okx,y=oky,abatch=abatch)
>>> ints<-intensity(abatch)
>>> ints[-indok,]<-NA
>>> intensity(abatch)<-ints
>>> esetb<-rma(abatch)
>>>
>>> Any ideas?
>>
>> You won't be able to just set things equal to NA in order to remove them
>> from consideration.
>>
>> There is some code that Ariel Chernomoretz posted back in '05 that might
>> still work to remove particular probes from a probeset:
>>
>> http://article.gmane.org/gmane.science.biology.informatics.conductor/425
>> 8/match=,
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> Thanks,
>>> Adi L. Tarca
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Hildebrandt Lab
>> 8220D MSRB III
>> 1150 W. Medical Center Drive
>> Ann Arbor MI 48109-0646
>> 734-936-8662
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list