Hi Ariel and Jenny,

thanks very much for the quick reply!!! I have been experimenting since with
the functions and it really seems to be what I'm looking for. A small test
with your test_script Ariel returned what I hoped for. However there is one
aspect confusing me. After writing a little function myself, which computes
the probe and probeset list which should be excluded in my case I applied
the removeProbes function on these lists. While I'm writing this message my
computer is still calculating and he has started four hours ago! Is it
normal for the function to be so time consuming or is there some chance that
I just have done something wrong? I suppose the problem is on my side but I
just can't imagine what it could be because I followed the "testscript"
steps. But as the reading of the CEL files usually needs just a few minutes
(maybe five if time consuming) I wouldn't have thought the removing of
probesets would take as long as it currently does. I must confess my lists
are really long. Still the problem is of big importance for me and so it
would be great to get an idea of your experience with the function
perfomance on your systems. Here is a little summary of the conditions in my

Chiptype: HG_U133_Plus2
num. samples: 2
num. Probes in listOutProbes: ~ 500.000
num. Sets in listOutProbeSets: ~ 30.000
System: P4 - 3 GHZ, 500 MB RAM,
R-version: RGUI 2.2.0 under WindowsXP.

Mainly I have been wondering that the removing of probes from the structure
takes so much longer than reading in the CEL files and building the
structure. Has anyone of you maybe started a test before trying to remove
all probesets from an environment? Or can you at least judge wether the
current running time in my case is ok or looks suspicious?

In any case just to make it clear again, thanks very much in any case. The
functions really seem to be of very, very much help!!! :-)


Hi Benjamin,

Some time ago I wrote a couple of functions that modified
the cdf environments in order to remove bad probes from the affy analysis

You could check:


hope that helps.


On December 1, 2005 11:06 am, Benjamin Otto wrote:
> Hi,
> I have been searching for nearly two days for a solution to the following
> problem without finding satisfactory answers:
> I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for
> certain probesets single Oligos such that the expression, p and fold
> values are calculated based on the remaining oligos?
> A better description of my problem and the background. We are handling a
> cross-species experiment having hybradized rna from tupaia on the human
> chip. This resulted in fairly low expression signals. If we just forget
> about all the other putative problems in analysing the result I think it
> seems reasonable to say, that in many cases only some of the probeset
> oligos will have hybridized satisfyingly. So the idea is masking some of
> the oligos by some criteria and calculate the results only based upon
> subsets of the probesets. The problem is: If I set even only one single
> oligo to NA, the values calculated for the corresponding probeset won't be
> calculated but set to NA. Most of the threads I found concerning the
> masking problem handle the question of an autpmated or corrected form of
> masking. But there seems to be no available information about our case.
> anyone done something like that before? I'm sure there will have to be
> manual programing. But the major question is: Does anybody see a
> possibility to mask the single oligos on a top level like fixing the
> affybatch structure? Or do I have to change every single function to treat
> NA values in the correct form?
> thanks for your help,
> Benjamin
