[BioC] [SPAM???] Re: Masking single Oligos in mas5

Tue Dec 6 19:54:58 CET 2005

Hi Benjamin

mmm...try replacing
the line :

$ tmp <- as.vector(ans[[i]][, i.probes])

with

$ tmp <- as.vector(as.matrix(ans[[i]])[, i.probes])

here is why i think that may work...

> l<-list(1,as.matrix(1))
> l[[1]][,1]
Error in l[[1]][, 1] : incorrect number of dimensions
> l[[2]][,1]
[1] 1
> as.matrix(l[[1]])[,1]
[1] 1

regards
ariel./

On December 6, 2005 12:54 pm, Benjamin Otto wrote:
> Hi Ariel,
>
> well I wish you a loooot of luck then for the pre-postdoc time. I hope you
> like your new job. :-)
>
> I think I have localized the problem for the function crash although I
> don't quite understand it yet. I was just writing this text when your
> answer reached me. Thanks!
> Back to the crash: I will just give two examples for better explanation.
>
> #Example 1: Include all probes of the probeset 117_at
> # with exception of the first probe
> # and don't delete the set itself
> $> listOutProbes <- c("117_at2","117_at3","117_at4",
> $+ "117_at5","117_at6","117_at7","117_at8","117_at9",
> $+ "117_at10","117_at11","117_at12","117_at13",
> $+ "117_at14","117_at15","117_at16")
>
> $> sets <- NULL
> $> ResetEnvir(cdfpackagename,probepackagename)
> $> RemoveProbes(probes,sets,cdfpackagename,probepackagename)
>
>
> #Example 2: Exapmle 1 again but ...
> # ... delete the set from the list
> $> listOutProbes <- c("117_at2","117_at3","117_at4",
> $+ "117_at5","117_at6","117_at7","117_at8","117_at9",
> $+ "117_at10","117_at11","117_at12","117_at13",
> $+ "117_at14","117_at15","117_at16")
>
> $> sets <- ("117_at")
> $> ..
> $> ..
>
>
> ONLY the construction in example number 1!!! returns the error message:
>
> $ Error in ans[[i]][, i.probes] : incorrect number of dimensions
>
> and that when it reaches the command:
>
> $ pmIndex <- unlist(indexProbes(newAB,"pm"))
>
> in function "RemoveProbes". Note that the command "indexProbes(newAB,"pm")"
> is causing the problem. That's why I snooped into the code of the method
> "indexProbes" and voila ... the problem source is finally the command:
>
> $ tmp <- as.vector(ans[[i]][, i.probes])
>
> where "ans" was created by
>
> $> envir <- getCdfInfo(newAB)
> $> genenames <- ls(envir)
> $> ans <- mget(genenames, envir, ifnotfound = NA)
>
>
> NOW: If ans[[i]] contains only ONE value for each of "pm" and "mm" then
>
> $> ans[[i]][,"pm"],
> $> ans[[i]][,"mm"]
> or ...
> $> ans[[i]][,c(1,2)]
>
> causes R to return the error message. I don't know whether the following
> observation has to do with that or is rather neglectable:
>
> $ # for the probeset with only one probe left I get:
> $ Browse[1]> ans[[i]][1]
> $     pm
> $1051223
>
> $ # for a probeset with more probes left I get:
> $ Browse[1]> ans[[i]][1]
> $ [1] 1051223
>
>
> FINALLY: Sounds as if the bells should ring in ecstasy in my head telling
> me what the problem really is, unfortunately I don't understand it yet.
> Maybe someone has an idea whether this is some kind of expected problem
> which could be corrected by the way "ans" is created. I think I will have
> to experiment a little bit more...
>
> regards,
> Benjamin
>
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch
> > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel
> > Chernomoretz
> > Sent: 06 December 2005 17:10
> > To: bioconductor at stat.math.ethz.ch
> > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5
> >
> >
> >
> >
> > Hi Benjamin
> >
> > I am sorry you have problems with the code.
> >
> > Regarding the computation time, yes. It is slow even with small/mid sized
> > lists, so in your case I would expect a loooot of processing time
> > (more RAM
> > would  help also ;-) ).  I guess that to improve this, the
> > function should be
> > completely rewritten using perhaps a different approach.
> >
> > However, once  you succed in generating a modified cdf
> > environment, and if you
> > are planning to compute expressions many times having the same probes
> > removed, you should keep (save) the modified cdf environment as
> > an R object.
> > I wrote a function (RemoveProbes2, included at the end) that takes the
> > clean.cdf.environment as an input argument. So everything should
> > be faster
> > for subsequent calculations.
> >
> >
> > Unfortunately I am not a lot at the office these days. My postdoc
> > is about to
> > end here, and I am moving in a couple of weeks, so things are a
> > little bit
> > caotic this days! However I will check your code asap.
> >
> > quick observations/questions:
> > 1) are you removing entire probesets specifiyng the whole set of
> > probes via
> > the rmd$Probes list AND the rmd$Set?
> > 2) if you plan to use presence calls, be cautious. After the
> > probe removal
> > step you end up with probeset with different number of probes so
> > be carefull
> > with  the  significance levels of the Wilcoxon'rank test
> >
> >
> >
> > Regards,
> > Ariel./
> >
> >
> >
> >
> > ################################################
> > #       clean.cdf: a modified cdf environment. Should be generated
> > #                         using RemoveProbes and saving the modified
> > #                         environment afterwards
> > #       cdfpackagename:   (e.g. "hgu95av2cdf")
> > #       probepackagename: (e.g. "hgu95av2probe")
> > #       destructive: unimplemented option, see NOTE
> > RemoveProbes2<-function(clean.cdfenv    =NULL,
> >                        
> > cdfpackagename,probepackagename,destructive=TRUE){
> >
> >  #default probe dataset values
> >  probe.env.orig <- get(probepackagename)
> >
> >
> >
> >  if(is.null(clean.cdfenv)){
> >   cat("A cdfenv should be provided\n")\
> >   return();
> >  }else{ #clean.cdf provided. much, much faster!
> >   ipos<-grep(cdfpackagename,search())
> >   rm(list=c(cdfpackagename),pos=ipos)
> >   assign(cdfpackagename,clean.cdfenv,pos=ipos)
> >  }
> >
> >
> >  # setting the PROBE env accordingly (idea from gcrma
> > compute.affinities.R)
> >  tmp <- get("xy2i",paste("package:",cdfpackagename,sep=""))
> >  newAB   <- new("AffyBatch",cdfName=cleancdf)
> >  pmIndex <-  unlist(indexProbes(newAB,"pm"))
> >  subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex)
> >  rm(newAB)
> >  iNA     <- which(is.na(subIndex))
> >
> >  if(length(iNA)>0){
> >   ipos<-grep(probepackagename,search())
> >   assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
> >  }
> > }
> >
> > ###############################################
> >
> > On December 6, 2005 04:07 am, Benjamin Otto wrote:
> > > Hi folks,
> > >
> > > sorry for taking your time again. I let the computer finish the
> > > "removeProbes"-calculation over night, so I don't know how long he
> > > took...to finally display the following error message:
> > >
> > > "Error in ans[[i]][, i.probes] : incorrect number of dimensions"
> > >
> > > You can imagine how desperately I was looking for a shotgun
> >
> > that moment. To
> >
> > > provide maybe more information I forgot to submit in my last thread
> > > here come the commands and extracts of my lists / structures I used:
> > >
> > > $# libraries used
> > > $> library(simpleaffy)
> > > $> library(affydata)
> > > $> require(gcrma)
> > > $
> > > $# preliminary commands
> > > $> setwd ("....")
> > > $> x <- read.affy()
> > > $> x at cdfName <- "HGU133Plus2"
> > > $> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE)
> > > $> cdfpackagename <- paste(cleancdf,"cdf",sep="")
> > > $> probepackagename <- paste(cleancdf,"probe",sep="")
> > > $> ResetEnvir(cdfpackagename,probepackagename)
> > > $
> > > $# my function for indentification of unwanted oligos
> > > $# I will append the function code below
> > > $>  rmd <- get.foul.oligos(x, ratio=2, diff=100)
> > > $
> > > $# the initial rma/mas calculation without changes
> > > $> d.rma <- rma(x)
> > > $> d.gcrma <- gcrma(x)
> > > $> d.mas <- mas5(x)
> > > $> d.mas.call <- mas5calls(x)
> > > $
> > > $# The command for invoking the removeProbes function
> > > $> RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename)
> > >
> > >
> > > The probe list "rmd$Probes" contains 565825 entries, the set
> >
> > list "rmd$Set"
> >
> > > 32791 entries. A little structure check:
> > >
> > > $> rmd$Set[1:5]
> > > $[1] "1007_s_at" "1053_at"   "1255_g_at" "1405_i_at" "1438_at"
> > > $> rmd$Probes[1:5]
> > > $[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4" "1007_s_at5"
> > >
> > > Looks fine to me, but is it? What could cause the error message here ?
> > >
> > > Here comes the code of my function:
> > >
> > >
> > > get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) {
> > >
> > > 	remove <- list()
> > > 	remove$Probes <- list()
> > > 	remove$Set <- list()
> > > 	len.plist <- 0
> > > 	len.slist <- 0
> > >
> > >
> > > 	ids <- geneNames(x)
> > > 	num.ids <- length(ids)
> > > 	num.samples <- length(pm(x)[1,])
> > > 	msk.table <- matrix(NA,num.ids,2)
> > > 	rm.string <- NA
> > >
> > > 	for (i in 1:num.ids) {
> > > 		id <- ids[i]
> > > 		oligos.pm <- indexProbes(x, which="pm", genenames=id)[[id]]
> > > 		oligos.mm <- indexProbes(x, which="mm", genenames=id)[[id]]
> > > 		num.oligos <- length(oligos.pm)
> > > 		num.rm <- 0
> > > 		for (j in 1:num.oligos) {
> > > 			cat ("i:",i," j:",j,"\n")
> > > 			rm.flag <- TRUE
> > > 			for (k in 1:num.samples) {
> > > 				val.pm <- intensity(x)[oligos.pm[j],k]
> > > 				val.mm <- intensity(x)[oligos.mm[j],k]
> > > 				if (val.pm/val.mm > ratio &
> >
> > val.pm-val.mm > diff) {
> >
> > > 					rm.flag <- FALSE
> > > 				}
> > > 			}
> > > 			if (rm.flag == TRUE) {
> > > 				num.rm <- num.rm + 1
> > > 				oligo.name <- paste(id,j,sep="")
> > > 				if (len.plist == 1) {
> > > 					remove$Probes <-
> >
> > c(remove$Probes,oligo.name)
> >
> > > 				}
> > > 				else { remove$Probes <- c(oligo.name) }
> > > 				len.plist <- 1
> > >
> > > 				if (is.na(rm.string)) { rm.string <- j }
> > > 				else { rm.string <-
> >
> > paste(rm.string,j,sep=",") }
> >
> > > 			}
> > > 		}
> > > 		if (num.rm == num.oligos) {
> > > 			if (len.slist == 1) {
> > > 				remove$Set <- c(remove$Set,id)
> > > 			}
> > > 			else { remove$Set <- c(id) }
> > > 			len.slist <- 1
> > >
> > > 			rm.string <- "all"
> > > 		}
> > > 	}
> > > 	return(remove)
> > > }
> > >
> > > > -----Original Message-----
> > > > From: bioconductor-bounces at stat.math.ethz.ch
> > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin
> > > > Otto
> > > > Sent: 05 December 2005 17:14
> > > > To: bioconductor at stat.math.ethz.ch
> > > > Subject: Re: [BioC] Masking single Oligos in mas5
> > > >
> > > >
> > > > Hi Ariel and Jenny,
> > > >
> > > > thanks very much for the quick reply!!! I have been experimenting
> > > > since with
> > > > the functions and it really seems to be what I'm looking for. A small
> > > > test with your test_script Ariel returned what I hoped for. However
> > > > there is one
> > > > aspect confusing me. After writing a little function myself,
> > > > which computes
> > > > the probe and probeset list which should be excluded in my
> >
> > case I applied
> >
> > > > the removeProbes function on these lists. While I'm writing this
> > > > message my
> > > > computer is still calculating and he has started four hours ago! Is
> > > > it normal for the function to be so time consuming or is there some
> > > > chance that
> > > > I just have done something wrong? I suppose the problem is on my
> > > > side but I
> > > > just can't imagine what it could be because I followed the
> >
> > "testscript"
> >
> > > > steps. But as the reading of the CEL files usually needs just a
> > > > few minutes
> > > > (maybe five if time consuming) I wouldn't have thought the removing
> > > > of probesets would take as long as it currently does. I must confess
> > > > my lists are really long. Still the problem is of big importance
> >
> > for me and
> >
> > > > so it would be great to get an idea of your experience with
> >
> > the function
> >
> > > > perfomance on your systems. Here is a little summary of the
> > > > conditions in my
> > > > case:
> > > >
> > > > Chiptype: HG_U133_Plus2
> > > > num. samples: 2
> > > > num. Probes in listOutProbes: ~ 500.000
> > > > num. Sets in listOutProbeSets: ~ 30.000
> > > > System: P4 - 3 GHZ, 500 MB RAM,
> > > > R-version: RGUI 2.2.0 under WindowsXP.
> > > >
> > > > Mainly I have been wondering that the removing of probes from the
> > > > structure
> > > > takes so much longer than reading in the CEL files and building the
> > > > structure. Has anyone of you maybe started a test before
> >
> > trying to remove
> >
> > > > all probesets from an environment? Or can you at least judge
> >
> > wether the
> >
> > > > current running time in my case is ok or looks suspicious?
> > > >
> > > > In any case just to make it clear again, thanks very much in any
> > > > case. The functions really seem to be of very, very much help!!! :-)
> > > >
> > > > Sincerly,
> > > > Benjamin
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: bioconductor-bounces at stat.math.ethz.ch
> > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel
> > > > Chernomoretz
> > > > Sent: 01 December 2005 17:32
> > > > To: bioconductor at stat.math.ethz.ch
> > > > Subject: Re: [BioC] Masking single Oligos in mas5
> > > >
> > > >
> > > > Hi Benjamin,
> > > >
> > > > Some time ago I wrote a couple of functions that modified
> > > > the cdf environments in order to remove bad probes from the
> >
> > affy analysis
> >
> > > > You could check:
> > > >
> > > > http://files.protsuggest.org/biocond/html/7350.html
> > > > http://files.protsuggest.org/biocond/html/7366.html
> > > > http://files.protsuggest.org/biocond/html/7367.html
> > > >
> > > >
> > > > hope that helps.
> > > >
> > > > ariel./
> > > >
> > > > On December 1, 2005 11:06 am, Benjamin Otto wrote:
> > > > > Hi,
> > > > >
> > > > > I have been searching for nearly two days for a solution to the
> > > >
> > > > following
> > > >
> > > > > problem without finding satisfactory answers:
> > > > >
> > > > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask
> > > > > for certain probesets single Oligos such that the expression, p and
> > > > > fold
> > > >
> > > > change
> > > >
> > > > > values are calculated based on the remaining oligos?
> > > > >
> > > > > A better description of my problem and the background. We
> >
> > are handling
> >
> > > > > a cross-species experiment having hybradized rna from tupaia on the
> > > > > human chip. This resulted in fairly low expression signals.
> >
> > If we just
> >
> > > > > forget about all the other putative problems in analysing
> >
> > the result I
> >
> > > > > think it seems reasonable to say, that in many cases only
> >
> > some of the
> >
> > > > > probeset oligos will have hybridized satisfyingly. So the idea is
> > > > > masking some of the oligos by some criteria and calculate
> >
> > the results
> >
> > > > > only based upon subsets of the probesets. The problem is:
> >
> > If I set even
> >
> > > > > only one single oligo to NA, the values calculated for the
> > > > > corresponding
> > > >
> > > > probeset won't be
> > > >
> > > > > calculated but set to NA. Most of the threads I found concerning
> > > > > the masking problem handle the question of an autpmated or
> >
> > corrected form
> >
> > > > > of masking. But there seems to be no available information about
> > > > > our case.
> > > >
> > > > Has
> > > >
> > > > > anyone done something like that before? I'm sure there will
> >
> > have to be
> >
> > > > some
> > > >
> > > > > manual programing. But the major question is: Does anybody see a
> > > > > possibility to mask the single oligos on a top level like fixing
> > > > > the affybatch structure? Or do I have to change every single
> > > >
> > > > function to treat
> > > >
> > > > > NA values in the correct form?
> > > > >
> > > > > thanks for your help,
> > > > > Benjamin
> > > > >
> > > > > _______________________________________________
> > > > > Bioconductor mailing list
> > > > > Bioconductor at stat.math.ethz.ch
> > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >
> > > > --
> > > > Ariel Chernomoretz, Ph.D.
> > > > Centre de recherche du CHUL
> > > > 2705 Blv Laurier, bloc T-367
> > > > Sainte-Foy, Qc
> > > > G1V 4G2
> > > > (418)-525-4444 ext 46339
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> > --
> > Ariel Chernomoretz, Ph.D.
> > Centre de recherche du CHUL
> > 2705 Blv Laurier, bloc T-367
> > Sainte-Foy, Qc
> > G1V 4G2
> > (418)-525-4444 ext 46339
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

-- 
Ariel Chernomoretz, Ph.D.
Centre de recherche du CHUL
2705 Blv Laurier, bloc T-367
Sainte-Foy, Qc
G1V 4G2
(418)-525-4444 ext 46339