[BioC] Masking single Oligos in mas5

Benjamin Otto b.otto at uke.uni-hamburg.de
Tue Dec 6 10:07:53 CET 2005


Hi folks,

sorry for taking your time again. I let the computer finish the
"removeProbes"-calculation over night, so I don't know how long he took...to
finally display the following error message:

"Error in ans[[i]][, i.probes] : incorrect number of dimensions"

You can imagine how desperately I was looking for a shotgun that moment. To
provide maybe more information I forgot to submit in my last thread here
come the commands and extracts of my lists / structures I used:

$# libraries used
$> library(simpleaffy)
$> library(affydata)
$> require(gcrma)
$
$# preliminary commands
$> setwd ("....")
$> x <- read.affy()
$> x at cdfName <- "HGU133Plus2"
$> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE)
$> cdfpackagename <- paste(cleancdf,"cdf",sep="")
$> probepackagename <- paste(cleancdf,"probe",sep="")
$> ResetEnvir(cdfpackagename,probepackagename)
$
$# my function for indentification of unwanted oligos
$# I will append the function code below
$>  rmd <- get.foul.oligos(x, ratio=2, diff=100)
$
$# the initial rma/mas calculation without changes
$> d.rma <- rma(x)
$> d.gcrma <- gcrma(x)
$> d.mas <- mas5(x)
$> d.mas.call <- mas5calls(x)
$
$# The command for invoking the removeProbes function
$> RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename)


The probe list "rmd$Probes" contains 565825 entries, the set list "rmd$Set"
32791 entries. A little structure check:

$> rmd$Set[1:5]
$[1] "1007_s_at" "1053_at"   "1255_g_at" "1405_i_at" "1438_at"
$> rmd$Probes[1:5]
$[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4" "1007_s_at5"

Looks fine to me, but is it? What could cause the error message here ?

Here comes the code of my function:


get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) {

	remove <- list()
	remove$Probes <- list()
	remove$Set <- list()
	len.plist <- 0
	len.slist <- 0


	ids <- geneNames(x)
	num.ids <- length(ids)
	num.samples <- length(pm(x)[1,])
	msk.table <- matrix(NA,num.ids,2)
	rm.string <- NA

	for (i in 1:num.ids) {
		id <- ids[i]
		oligos.pm <- indexProbes(x, which="pm", genenames=id)[[id]]
		oligos.mm <- indexProbes(x, which="mm", genenames=id)[[id]]
		num.oligos <- length(oligos.pm)
		num.rm <- 0
		for (j in 1:num.oligos) {
			cat ("i:",i," j:",j,"\n")
			rm.flag <- TRUE
			for (k in 1:num.samples) {
				val.pm <- intensity(x)[oligos.pm[j],k]
				val.mm <- intensity(x)[oligos.mm[j],k]
				if (val.pm/val.mm > ratio & val.pm-val.mm > diff) {
					rm.flag <- FALSE
				}
			}
			if (rm.flag == TRUE) {
				num.rm <- num.rm + 1
				oligo.name <- paste(id,j,sep="")
				if (len.plist == 1) {
					remove$Probes <- c(remove$Probes,oligo.name)
				}
				else { remove$Probes <- c(oligo.name) }
				len.plist <- 1

				if (is.na(rm.string)) { rm.string <- j }
				else { rm.string <- paste(rm.string,j,sep=",") }
			}
		}
		if (num.rm == num.oligos) {
			if (len.slist == 1) {
				remove$Set <- c(remove$Set,id)
			}
			else { remove$Set <- c(id) }
			len.slist <- 1

			rm.string <- "all"
		}
	}
	return(remove)
}






> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin
> Otto
> Sent: 05 December 2005 17:14
> To: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Masking single Oligos in mas5
>
>
> Hi Ariel and Jenny,
>
> thanks very much for the quick reply!!! I have been experimenting
> since with
> the functions and it really seems to be what I'm looking for. A small test
> with your test_script Ariel returned what I hoped for. However
> there is one
> aspect confusing me. After writing a little function myself,
> which computes
> the probe and probeset list which should be excluded in my case I applied
> the removeProbes function on these lists. While I'm writing this
> message my
> computer is still calculating and he has started four hours ago! Is it
> normal for the function to be so time consuming or is there some
> chance that
> I just have done something wrong? I suppose the problem is on my
> side but I
> just can't imagine what it could be because I followed the "testscript"
> steps. But as the reading of the CEL files usually needs just a
> few minutes
> (maybe five if time consuming) I wouldn't have thought the removing of
> probesets would take as long as it currently does. I must confess my lists
> are really long. Still the problem is of big importance for me and so it
> would be great to get an idea of your experience with the function
> perfomance on your systems. Here is a little summary of the
> conditions in my
> case:
>
> Chiptype: HG_U133_Plus2
> num. samples: 2
> num. Probes in listOutProbes: ~ 500.000
> num. Sets in listOutProbeSets: ~ 30.000
> System: P4 - 3 GHZ, 500 MB RAM,
> R-version: RGUI 2.2.0 under WindowsXP.
>
> Mainly I have been wondering that the removing of probes from the
> structure
> takes so much longer than reading in the CEL files and building the
> structure. Has anyone of you maybe started a test before trying to remove
> all probesets from an environment? Or can you at least judge wether the
> current running time in my case is ok or looks suspicious?
>
> In any case just to make it clear again, thanks very much in any case. The
> functions really seem to be of very, very much help!!! :-)
>
> Sincerly,
> Benjamin
>
>
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel
> Chernomoretz
> Sent: 01 December 2005 17:32
> To: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Masking single Oligos in mas5
>
>
> Hi Benjamin,
>
> Some time ago I wrote a couple of functions that modified
> the cdf environments in order to remove bad probes from the affy analysis
>
> You could check:
>
> http://files.protsuggest.org/biocond/html/7350.html
> http://files.protsuggest.org/biocond/html/7366.html
> http://files.protsuggest.org/biocond/html/7367.html
>
>
> hope that helps.
>
> ariel./
>
>
> On December 1, 2005 11:06 am, Benjamin Otto wrote:
> > Hi,
> >
> > I have been searching for nearly two days for a solution to the
> following
> > problem without finding satisfactory answers:
> >
> > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for
> > certain probesets single Oligos such that the expression, p and fold
> change
> > values are calculated based on the remaining oligos?
> >
> > A better description of my problem and the background. We are handling a
> > cross-species experiment having hybradized rna from tupaia on the human
> > chip. This resulted in fairly low expression signals. If we just forget
> > about all the other putative problems in analysing the result I think it
> > seems reasonable to say, that in many cases only some of the probeset
> > oligos will have hybridized satisfyingly. So the idea is masking some of
> > the oligos by some criteria and calculate the results only based upon
> > subsets of the probesets. The problem is: If I set even only one single
> > oligo to NA, the values calculated for the corresponding
> probeset won't be
> > calculated but set to NA. Most of the threads I found concerning the
> > masking problem handle the question of an autpmated or corrected form of
> > masking. But there seems to be no available information about our case.
> Has
> > anyone done something like that before? I'm sure there will have to be
> some
> > manual programing. But the major question is: Does anybody see a
> > possibility to mask the single oligos on a top level like fixing the
> > affybatch structure? Or do I have to change every single
> function to treat
> > NA values in the correct form?
> >
> > thanks for your help,
> > Benjamin
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> --
> Ariel Chernomoretz, Ph.D.
> Centre de recherche du CHUL
> 2705 Blv Laurier, bloc T-367
> Sainte-Foy, Qc
> G1V 4G2
> (418)-525-4444 ext 46339
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list