[BioC] [SPAM???] Re: Masking single Oligos in mas5

Benjamin Otto b.otto at uke.uni-hamburg.de
Thu Dec 15 13:32:09 CET 2005


Hi Ariel,

I have been experimenting with the new function "RemoveProbes2"
unfortunately without success. However that might be due to the fact that
I'm not sure how to save the environment. For the saving steps I followed
the vignette of Laurent Gautier under
"www.bioconductor.org/repository/devel/vignette/ngenomeschips.pdf".
To be precise I used the following commands

> cdfpackagename
[1] "hgu133plus2cdf"
> probepackagename
[1] "hgu133plus2probe"
> plcdf <- wrapCdfEnvAffy(hgu133plus2cdf,1164,1164,"hgu133plus2cdf")
> envplcdf <- as(plcdf,"environment")
> x at cdfName <- envplcdf
> save(x,plcdf,envplcdf,file="HGU133P2x200.rda")
>

which seemed to work fine, but I can't tell for sure. Now when I called the
new function it pretends the file doesn't contain any cdf information !?

> cdfpackagename
[1] "hgu133plus2cdf"
> probepackagename
[1] "hgu133plus2probe"
>
RestoreCdfEnv(clean.cdf="HGU133P2x200.rda",cdfpackagename,probepackagename)
Warning: package 'hgu133plus2cdf' is in use and will not be installed
Error in getCdfInfo(object) : Could not obtain CDF environment, problems
encountered:
Specified environment does not contain hgu133plus2
HGU133P2x200.rda
Data for package affy did not contain hgu133plus2cdf
HGU133P2x200.rda

I'm aware that most probably the environment doesn't get stored correctly in
the file at the first place. Yet, I didn't find any convinient alternative
for doing so.

Regards,
Benjamin

P.S.: And a merry 1st, 2nd and 3rd Advent




> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel
> Chernomoretz
> Sent: 06 December 2005 19:55
> To: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5
>
>
> Hi Benjamin
>
> mmm...try replacing
> the line :
>
> $ tmp <- as.vector(ans[[i]][, i.probes])
>
> with
>
> $ tmp <- as.vector(as.matrix(ans[[i]])[, i.probes])
>
>
>
>
> here is why i think that may work...
>
> > l<-list(1,as.matrix(1))
> > l[[1]][,1]
> Error in l[[1]][, 1] : incorrect number of dimensions
> > l[[2]][,1]
> [1] 1
> > as.matrix(l[[1]])[,1]
> [1] 1
>
>
> regards
> ariel./
>
>
>
> On December 6, 2005 12:54 pm, Benjamin Otto wrote:
> > Hi Ariel,
> >
> > well I wish you a loooot of luck then for the pre-postdoc time.
> I hope you
> > like your new job. :-)
> >
> > I think I have localized the problem for the function crash although I
> > don't quite understand it yet. I was just writing this text when your
> > answer reached me. Thanks!
> > Back to the crash: I will just give two examples for better explanation.
> >
> > #Example 1: Include all probes of the probeset 117_at
> > # with exception of the first probe
> > # and don't delete the set itself
> > $> listOutProbes <- c("117_at2","117_at3","117_at4",
> > $+ "117_at5","117_at6","117_at7","117_at8","117_at9",
> > $+ "117_at10","117_at11","117_at12","117_at13",
> > $+ "117_at14","117_at15","117_at16")
> >
> > $> sets <- NULL
> > $> ResetEnvir(cdfpackagename,probepackagename)
> > $> RemoveProbes(probes,sets,cdfpackagename,probepackagename)
> >
> >
> > #Example 2: Exapmle 1 again but ...
> > # ... delete the set from the list
> > $> listOutProbes <- c("117_at2","117_at3","117_at4",
> > $+ "117_at5","117_at6","117_at7","117_at8","117_at9",
> > $+ "117_at10","117_at11","117_at12","117_at13",
> > $+ "117_at14","117_at15","117_at16")
> >
> > $> sets <- ("117_at")
> > $> ..
> > $> ..
> >
> >
> > ONLY the construction in example number 1!!! returns the error message:
> >
> > $ Error in ans[[i]][, i.probes] : incorrect number of dimensions
> >
> > and that when it reaches the command:
> >
> > $ pmIndex <- unlist(indexProbes(newAB,"pm"))
> >
> > in function "RemoveProbes". Note that the command
> "indexProbes(newAB,"pm")"
> > is causing the problem. That's why I snooped into the code of the method
> > "indexProbes" and voila ... the problem source is finally the command:
> >
> > $ tmp <- as.vector(ans[[i]][, i.probes])
> >
> > where "ans" was created by
> >
> > $> envir <- getCdfInfo(newAB)
> > $> genenames <- ls(envir)
> > $> ans <- mget(genenames, envir, ifnotfound = NA)
> >
> >
> > NOW: If ans[[i]] contains only ONE value for each of "pm" and "mm" then
> >
> > $> ans[[i]][,"pm"],
> > $> ans[[i]][,"mm"]
> > or ...
> > $> ans[[i]][,c(1,2)]
> >
> > causes R to return the error message. I don't know whether the following
> > observation has to do with that or is rather neglectable:
> >
> > $ # for the probeset with only one probe left I get:
> > $ Browse[1]> ans[[i]][1]
> > $     pm
> > $1051223
> >
> > $ # for a probeset with more probes left I get:
> > $ Browse[1]> ans[[i]][1]
> > $ [1] 1051223
> >
> >
> > FINALLY: Sounds as if the bells should ring in ecstasy in my
> head telling
> > me what the problem really is, unfortunately I don't understand it yet.
> > Maybe someone has an idea whether this is some kind of expected problem
> > which could be corrected by the way "ans" is created. I think I
> will have
> > to experiment a little bit more...
> >
> > regards,
> > Benjamin
> >
> > > -----Original Message-----
> > > From: bioconductor-bounces at stat.math.ethz.ch
> > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel
> > > Chernomoretz
> > > Sent: 06 December 2005 17:10
> > > To: bioconductor at stat.math.ethz.ch
> > > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5
> > >
> > >
> > >
> > >
> > > Hi Benjamin
> > >
> > > I am sorry you have problems with the code.
> > >
> > > Regarding the computation time, yes. It is slow even with
> small/mid sized
> > > lists, so in your case I would expect a loooot of processing time
> > > (more RAM
> > > would  help also ;-) ).  I guess that to improve this, the
> > > function should be
> > > completely rewritten using perhaps a different approach.
> > >
> > > However, once  you succed in generating a modified cdf
> > > environment, and if you
> > > are planning to compute expressions many times having the same probes
> > > removed, you should keep (save) the modified cdf environment as
> > > an R object.
> > > I wrote a function (RemoveProbes2, included at the end) that takes the
> > > clean.cdf.environment as an input argument. So everything should
> > > be faster
> > > for subsequent calculations.
> > >
> > >
> > > Unfortunately I am not a lot at the office these days. My postdoc
> > > is about to
> > > end here, and I am moving in a couple of weeks, so things are a
> > > little bit
> > > caotic this days! However I will check your code asap.
> > >
> > > quick observations/questions:
> > > 1) are you removing entire probesets specifiyng the whole set of
> > > probes via
> > > the rmd$Probes list AND the rmd$Set?
> > > 2) if you plan to use presence calls, be cautious. After the
> > > probe removal
> > > step you end up with probeset with different number of probes so
> > > be carefull
> > > with  the  significance levels of the Wilcoxon'rank test
> > >
> > >
> > >
> > > Regards,
> > > Ariel./
> > >
> > >
> > >
> > >
> > > ################################################
> > > #       clean.cdf: a modified cdf environment. Should be generated
> > > #                         using RemoveProbes and saving the modified
> > > #                         environment afterwards
> > > #       cdfpackagename:   (e.g. "hgu95av2cdf")
> > > #       probepackagename: (e.g. "hgu95av2probe")
> > > #       destructive: unimplemented option, see NOTE
> > > RemoveProbes2<-function(clean.cdfenv    =NULL,
> > >
> > > cdfpackagename,probepackagename,destructive=TRUE){
> > >
> > >  #default probe dataset values
> > >  probe.env.orig <- get(probepackagename)
> > >
> > >
> > >
> > >  if(is.null(clean.cdfenv)){
> > >   cat("A cdfenv should be provided\n")\
> > >   return();
> > >  }else{ #clean.cdf provided. much, much faster!
> > >   ipos<-grep(cdfpackagename,search())
> > >   rm(list=c(cdfpackagename),pos=ipos)
> > >   assign(cdfpackagename,clean.cdfenv,pos=ipos)
> > >  }
> > >
> > >
> > >  # setting the PROBE env accordingly (idea from gcrma
> > > compute.affinities.R)
> > >  tmp <- get("xy2i",paste("package:",cdfpackagename,sep=""))
> > >  newAB   <- new("AffyBatch",cdfName=cleancdf)
> > >  pmIndex <-  unlist(indexProbes(newAB,"pm"))
> > >  subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex)
> > >  rm(newAB)
> > >  iNA     <- which(is.na(subIndex))
> > >
> > >  if(length(iNA)>0){
> > >   ipos<-grep(probepackagename,search())
> > >   assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
> > >  }
> > > }
> > >
> > > ###############################################
> > >
> > > On December 6, 2005 04:07 am, Benjamin Otto wrote:
> > > > Hi folks,
> > > >
> > > > sorry for taking your time again. I let the computer finish the
> > > > "removeProbes"-calculation over night, so I don't know how long he
> > > > took...to finally display the following error message:
> > > >
> > > > "Error in ans[[i]][, i.probes] : incorrect number of dimensions"
> > > >
> > > > You can imagine how desperately I was looking for a shotgun
> > >
> > > that moment. To
> > >
> > > > provide maybe more information I forgot to submit in my last thread
> > > > here come the commands and extracts of my lists / structures I used:
> > > >
> > > > $# libraries used
> > > > $> library(simpleaffy)
> > > > $> library(affydata)
> > > > $> require(gcrma)
> > > > $
> > > > $# preliminary commands
> > > > $> setwd ("....")
> > > > $> x <- read.affy()
> > > > $> x at cdfName <- "HGU133Plus2"
> > > > $> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE)
> > > > $> cdfpackagename <- paste(cleancdf,"cdf",sep="")
> > > > $> probepackagename <- paste(cleancdf,"probe",sep="")
> > > > $> ResetEnvir(cdfpackagename,probepackagename)
> > > > $
> > > > $# my function for indentification of unwanted oligos
> > > > $# I will append the function code below
> > > > $>  rmd <- get.foul.oligos(x, ratio=2, diff=100)
> > > > $
> > > > $# the initial rma/mas calculation without changes
> > > > $> d.rma <- rma(x)
> > > > $> d.gcrma <- gcrma(x)
> > > > $> d.mas <- mas5(x)
> > > > $> d.mas.call <- mas5calls(x)
> > > > $
> > > > $# The command for invoking the removeProbes function
> > > > $> RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename)
> > > >
> > > >
> > > > The probe list "rmd$Probes" contains 565825 entries, the set
> > >
> > > list "rmd$Set"
> > >
> > > > 32791 entries. A little structure check:
> > > >
> > > > $> rmd$Set[1:5]
> > > > $[1] "1007_s_at" "1053_at"   "1255_g_at" "1405_i_at" "1438_at"
> > > > $> rmd$Probes[1:5]
> > > > $[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4"
> "1007_s_at5"
> > > >
> > > > Looks fine to me, but is it? What could cause the error
> message here ?
> > > >
> > > > Here comes the code of my function:
> > > >
> > > >
> > > > get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) {
> > > >
> > > > 	remove <- list()
> > > > 	remove$Probes <- list()
> > > > 	remove$Set <- list()
> > > > 	len.plist <- 0
> > > > 	len.slist <- 0
> > > >
> > > >
> > > > 	ids <- geneNames(x)
> > > > 	num.ids <- length(ids)
> > > > 	num.samples <- length(pm(x)[1,])
> > > > 	msk.table <- matrix(NA,num.ids,2)
> > > > 	rm.string <- NA
> > > >
> > > > 	for (i in 1:num.ids) {
> > > > 		id <- ids[i]
> > > > 		oligos.pm <- indexProbes(x, which="pm",
> genenames=id)[[id]]
> > > > 		oligos.mm <- indexProbes(x, which="mm",
> genenames=id)[[id]]
> > > > 		num.oligos <- length(oligos.pm)
> > > > 		num.rm <- 0
> > > > 		for (j in 1:num.oligos) {
> > > > 			cat ("i:",i," j:",j,"\n")
> > > > 			rm.flag <- TRUE
> > > > 			for (k in 1:num.samples) {
> > > > 				val.pm <-
> intensity(x)[oligos.pm[j],k]
> > > > 				val.mm <-
> intensity(x)[oligos.mm[j],k]
> > > > 				if (val.pm/val.mm > ratio &
> > >
> > > val.pm-val.mm > diff) {
> > >
> > > > 					rm.flag <- FALSE
> > > > 				}
> > > > 			}
> > > > 			if (rm.flag == TRUE) {
> > > > 				num.rm <- num.rm + 1
> > > > 				oligo.name <- paste(id,j,sep="")
> > > > 				if (len.plist == 1) {
> > > > 					remove$Probes <-
> > >
> > > c(remove$Probes,oligo.name)
> > >
> > > > 				}
> > > > 				else { remove$Probes <-
> c(oligo.name) }
> > > > 				len.plist <- 1
> > > >
> > > > 				if (is.na(rm.string)) {
> rm.string <- j }
> > > > 				else { rm.string <-
> > >
> > > paste(rm.string,j,sep=",") }
> > >
> > > > 			}
> > > > 		}
> > > > 		if (num.rm == num.oligos) {
> > > > 			if (len.slist == 1) {
> > > > 				remove$Set <- c(remove$Set,id)
> > > > 			}
> > > > 			else { remove$Set <- c(id) }
> > > > 			len.slist <- 1
> > > >
> > > > 			rm.string <- "all"
> > > > 		}
> > > > 	}
> > > > 	return(remove)
> > > > }
> > > >
> > > > > -----Original Message-----
> > > > > From: bioconductor-bounces at stat.math.ethz.ch
> > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf
> Of Benjamin
> > > > > Otto
> > > > > Sent: 05 December 2005 17:14
> > > > > To: bioconductor at stat.math.ethz.ch
> > > > > Subject: Re: [BioC] Masking single Oligos in mas5
> > > > >
> > > > >
> > > > > Hi Ariel and Jenny,
> > > > >
> > > > > thanks very much for the quick reply!!! I have been experimenting
> > > > > since with
> > > > > the functions and it really seems to be what I'm looking
> for. A small
> > > > > test with your test_script Ariel returned what I hoped
> for. However
> > > > > there is one
> > > > > aspect confusing me. After writing a little function myself,
> > > > > which computes
> > > > > the probe and probeset list which should be excluded in my
> > >
> > > case I applied
> > >
> > > > > the removeProbes function on these lists. While I'm writing this
> > > > > message my
> > > > > computer is still calculating and he has started four
> hours ago! Is
> > > > > it normal for the function to be so time consuming or is
> there some
> > > > > chance that
> > > > > I just have done something wrong? I suppose the problem is on my
> > > > > side but I
> > > > > just can't imagine what it could be because I followed the
> > >
> > > "testscript"
> > >
> > > > > steps. But as the reading of the CEL files usually needs just a
> > > > > few minutes
> > > > > (maybe five if time consuming) I wouldn't have thought
> the removing
> > > > > of probesets would take as long as it currently does. I
> must confess
> > > > > my lists are really long. Still the problem is of big importance
> > >
> > > for me and
> > >
> > > > > so it would be great to get an idea of your experience with
> > >
> > > the function
> > >
> > > > > perfomance on your systems. Here is a little summary of the
> > > > > conditions in my
> > > > > case:
> > > > >
> > > > > Chiptype: HG_U133_Plus2
> > > > > num. samples: 2
> > > > > num. Probes in listOutProbes: ~ 500.000
> > > > > num. Sets in listOutProbeSets: ~ 30.000
> > > > > System: P4 - 3 GHZ, 500 MB RAM,
> > > > > R-version: RGUI 2.2.0 under WindowsXP.
> > > > >
> > > > > Mainly I have been wondering that the removing of probes from the
> > > > > structure
> > > > > takes so much longer than reading in the CEL files and
> building the
> > > > > structure. Has anyone of you maybe started a test before
> > >
> > > trying to remove
> > >
> > > > > all probesets from an environment? Or can you at least judge
> > >
> > > wether the
> > >
> > > > > current running time in my case is ok or looks suspicious?
> > > > >
> > > > > In any case just to make it clear again, thanks very much in any
> > > > > case. The functions really seem to be of very, very much
> help!!! :-)
> > > > >
> > > > > Sincerly,
> > > > > Benjamin
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: bioconductor-bounces at stat.math.ethz.ch
> > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel
> > > > > Chernomoretz
> > > > > Sent: 01 December 2005 17:32
> > > > > To: bioconductor at stat.math.ethz.ch
> > > > > Subject: Re: [BioC] Masking single Oligos in mas5
> > > > >
> > > > >
> > > > > Hi Benjamin,
> > > > >
> > > > > Some time ago I wrote a couple of functions that modified
> > > > > the cdf environments in order to remove bad probes from the
> > >
> > > affy analysis
> > >
> > > > > You could check:
> > > > >
> > > > > http://files.protsuggest.org/biocond/html/7350.html
> > > > > http://files.protsuggest.org/biocond/html/7366.html
> > > > > http://files.protsuggest.org/biocond/html/7367.html
> > > > >
> > > > >
> > > > > hope that helps.
> > > > >
> > > > > ariel./
> > > > >
> > > > > On December 1, 2005 11:06 am, Benjamin Otto wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I have been searching for nearly two days for a solution to the
> > > > >
> > > > > following
> > > > >
> > > > > > problem without finding satisfactory answers:
> > > > > >
> > > > > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask
> > > > > > for certain probesets single Oligos such that the
> expression, p and
> > > > > > fold
> > > > >
> > > > > change
> > > > >
> > > > > > values are calculated based on the remaining oligos?
> > > > > >
> > > > > > A better description of my problem and the background. We
> > >
> > > are handling
> > >
> > > > > > a cross-species experiment having hybradized rna from
> tupaia on the
> > > > > > human chip. This resulted in fairly low expression signals.
> > >
> > > If we just
> > >
> > > > > > forget about all the other putative problems in analysing
> > >
> > > the result I
> > >
> > > > > > think it seems reasonable to say, that in many cases only
> > >
> > > some of the
> > >
> > > > > > probeset oligos will have hybridized satisfyingly. So
> the idea is
> > > > > > masking some of the oligos by some criteria and calculate
> > >
> > > the results
> > >
> > > > > > only based upon subsets of the probesets. The problem is:
> > >
> > > If I set even
> > >
> > > > > > only one single oligo to NA, the values calculated for the
> > > > > > corresponding
> > > > >
> > > > > probeset won't be
> > > > >
> > > > > > calculated but set to NA. Most of the threads I found concerning
> > > > > > the masking problem handle the question of an autpmated or
> > >
> > > corrected form
> > >
> > > > > > of masking. But there seems to be no available information about
> > > > > > our case.
> > > > >
> > > > > Has
> > > > >
> > > > > > anyone done something like that before? I'm sure there will
> > >
> > > have to be
> > >
> > > > > some
> > > > >
> > > > > > manual programing. But the major question is: Does anybody see a
> > > > > > possibility to mask the single oligos on a top level like fixing
> > > > > > the affybatch structure? Or do I have to change every single
> > > > >
> > > > > function to treat
> > > > >
> > > > > > NA values in the correct form?
> > > > > >
> > > > > > thanks for your help,
> > > > > > Benjamin
> > > > > >
> > > > > > _______________________________________________
> > > > > > Bioconductor mailing list
> > > > > > Bioconductor at stat.math.ethz.ch
> > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > >
> > > > > --
> > > > > Ariel Chernomoretz, Ph.D.
> > > > > Centre de recherche du CHUL
> > > > > 2705 Blv Laurier, bloc T-367
> > > > > Sainte-Foy, Qc
> > > > > G1V 4G2
> > > > > (418)-525-4444 ext 46339
> > > > >
> > > > > _______________________________________________
> > > > > Bioconductor mailing list
> > > > > Bioconductor at stat.math.ethz.ch
> > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > >
> > > > > _______________________________________________
> > > > > Bioconductor mailing list
> > > > > Bioconductor at stat.math.ethz.ch
> > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >
> > > --
> > > Ariel Chernomoretz, Ph.D.
> > > Centre de recherche du CHUL
> > > 2705 Blv Laurier, bloc T-367
> > > Sainte-Foy, Qc
> > > G1V 4G2
> > > (418)-525-4444 ext 46339
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> --
> Ariel Chernomoretz, Ph.D.
> Centre de recherche du CHUL
> 2705 Blv Laurier, bloc T-367
> Sainte-Foy, Qc
> G1V 4G2
> (418)-525-4444 ext 46339
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list