[BioC] option rm.mask=TRUE in ReadAffy

Wed Nov 5 01:14:50 MET 2003

Thank you for your comment, Laurent.

Yes, your way is a smart way to remove in-file MASKS probes.

By the way, '.MSK' files are provided by Affymetrix or created by the
indivisual user which contains probe cells to be masked in the analysis
step. Old design chip like Hu6800 has a lot of masking probes to remove
cross-hybridization data. Detail description can be found in MAS5 user's
guide Appendix-G.

Regards,

Kawai

> This is a nice way to solve the problem.
>
> I did not suspect many were suffering from that.
> In my particular case I skip the bg.correct step quite often
> (and this appears to be the trouble. Most of the
> other preprocessing steps seem to behave (see my previous
> mail in this thread). Note that Ben Bosltad
> recalled (earlier in this very same thread)
> that the bgcorrect step tries to "correct" the signal
> for the probes. Masking probes away removes a part of the need
> for background correction (in my humble opinion, for the
> Affymetrix chips, and for many of these I chips I had; this
> might not be good be everyone)).
>
> The masked (and outliers) found in the CEL should probably be stored
> in the annotation/description for easier retrieval (but at AFAIR,
> this has been shortly in the devel-version some time ago (or was almost
there
> ;)),
> public request will hopefully bring it back that).
>
> I might have an other way to solve the problem:
> (I have never seen '.MSK' files. The 'masked probes' I know of
> are included in the CEL files. The CEL files also contain
> a section 'outliers').
>
>
> ## have the file names in filenames. For example :
> filenames <- list.files("where/I/want", "CEL", full.names=TRUE)
>
> abatch <- ReadAffy()
>
> ## compressed CEL files ?
> compress <- FALSE
>
> ids.list <- vector("list", length=length(filenames))
>
> for (i in seq(along=filenames)) {
>   file <- filenames[i]
>   masked.xy <- .Call("getIndexExtraFromCEL", as.character(file),
>                      as.character("MASKS"),
>                      as.integer(compress))
>   masked.i <- xy2indices(masked.xy[,1], masked.xy[, 2], abatch=abatch)
>   outliers.xy <- .Call("getIndexExtraFromCEL", as.character(file),
>                         as.character("OUTLIERS"),
>                         as.integer(compress))
>   outliers.i <- xy2indices(outliers.xy[, 1], outliers.xy[, 2],
abatch=abatch)
>   ids.list[[i]] <- rbind(masked.i, outliers.i)
> }
>
> ## then whenever ones judges the time has come to mask the outliers:
>
> for (i in seq(along=filenames)) {
>   intensity(abatch)[ids[[i]], i] <- NA
> }
>
>
>
>
> Best,
>
>
>
> L.
>
>
>
>
> On Fri, Oct 31, 2003 at 07:08:03PM +0900, t-kawai at hhc.eisai.co.jp wrote:
> > I have the same problem with expresso() as weiss.
> >
> > To solve this problem, I executed a series of procedures step by step.
> >
> > Background correction, I think, needs all of the CEL data wether those
are
> > to be masked or not.  From the normalization step, I removed masking
data
> > using Affy's original mask file.
> >
> > For example,
> >
> > ## read CEL files
> > dat0 <- ReadAffy();
> >
> > ## Normalization step
> > dat <- bg.correct.mas(dat0);
> >
> > ## read mask file (in this case, 1803 probes to be masked)
> > msk <- scan("Hu6800_ClassA.MSK", skip=2, list("", ""));
> >
> > ## set ids (cell index list to be masked)
> > ## (in this case, 59540 cells will be masked)
> > for (i in 1:length(msk[[1]])) {
> >         nam <- msk[[1]][i];
> >         txt <- gsub("-", ":", msk[[2]][i]);
> >         lst <- eval(parse(text=paste("c(", txt, ")")));
> >
> >         if (i == 1) {
> >                 ids <- pmindex(dat, nam)[[1]][lst];
> >         } else {
> >                 ids <- c(ids, pmindex(dat, nam)[[1]][lst]);
> >         }
> >         ids <- c(ids, mmindex(dat, nam)[[1]][lst]);
> > }
> >
> > ## set NAs to cells to be maksed
> > intensity(dat)[ids, ] <- NA;
> >
> > ## Normalization step
> > dat1 <- normalize.AffyBatch.qspline(dat);
> >
> > ## Probe correction & summary step
> > dat2 <- computeExprSet(dat1, pmcorrect="mas", summary.method="liwong");
> > write.exprs(dat2, file="result.lst");
> >
> >
> >
> > The file "Hu6800_ClassA.MSK" looks like
> > Hu6800
> > [Call]
> > A28102_at       17,18,19,20
> > AB000381_s_at   7
> > AC000064_cds2_at        18,19,20
> > ...
> > M81830_at       1-20
> > M83181_at       1-20
> > ...
> >
> >
> > Above script works rightly?  Please give me a comment...
> >
> > Kawai
> >
> > _______________________________________
> >
> > Takatoshi Kawai, Ph.D.
> >
> > Senior Sientist, Bioinformatics
> > Laboratoy of Seeds Finding Technology
> > Eisai Co., Ltd.
> > 5-1-3 Tokodai, Tsukuba-shi,
> > Ibaraki 300-2635, Japan
> >
> > TEL: +81-29-847-7192
> > FAX: +81-29-847-7614
> > e-mail: t-kawai at hhc.eisai.co.jp
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
> --
> --------------------------------------------------------------
> Laurent Gautier   CBS, Building 208, DTU
> PhD. Student   DK-2800 Lyngby,Denmark
> tel: +45 45 25 24 89  http://www.cbs.dtu.dk/laurent
>

_______________________________________

Takatoshi Kawai, Ph.D.

Senior Sientist, Bioinformatics
Laboratoy of Seeds Finding Technology
Eisai Co., Ltd.
5-1-3 Tokodai, Tsukuba-shi,
Ibaraki 300-2635, Japan

TEL: +81-29-847-7192
FAX: +81-29-847-7614
e-mail: t-kawai at hhc.eisai.co.jp