[BioC] raw probe set expression data and masking questions

Jenny Drnevich drnevich at uiuc.edu
Fri Sep 8 17:01:44 CEST 2006


Hi Donald,

At 07:42 AM 9/8/2006, Schwartz, Donald wrote:
>Anyone know a simple method to harvest the raw probe set level expression 
>data (not individual PM probe or PM-MM probe pair level data), which will 
>be summarized by average difference, and then export that into excel with 
>the Affy ID or "probe set name" as row headers and columns representing 
>samples?

I don't understand what you mean by 'raw probe set level expression'? The 
only raw values are the individual PM and MM probe values. There are 
various algorithms to combine them into a probe set level expression value, 
but by definition these are 'processed values', not 'raw' values.

>  Second question, I have a small affy custom array. I want to select 
> certain probe sets for analysis prior to any low-level processing (i.e., 
> prior to normalization and background correction, etc.). In other words, 
> I want to mask certain probe sets before normalization and background 
> correction. Following masking, the probe sets will be ignored for the 
> entire analysis. The difficulty is that there is no consistent naming 
> feature across all probe sets so the only way to identify probe sets to 
> mask or to analyze is with a character vector. In addition, I need the 
> flexibility to mask different sets of probe sets and then process 
> (normalize, bg.correct, etc) the data various ways. I'll use: 
> expresso(affybatch, widget=TRUE) to do this post masking.
>
>Thanks for your suggestions.
>Don

Below is some code to remove either individual probes or entire probesets. 
It was originally written by Ariel Chernomoretz and posted to the mailing 
list, and I have subsequently modified it slightly. I've tried to give 
explanations throughout - perhaps at some point I'll have time to make it 
into a small package. There's a chance it might not work with a custom affy 
array, as it assumes you have a cdf and a probe package for your chip named 
in the usual way: 'chipnamecdf' and 'chipnameprobe'. If you only want to 
remove entire probe sets, it might work even if you don't have a probe 
package for your custom array.

Good luck,
Jenny


### The first part is just creating two ojects (ResetEnvir and 
RemoveProbes) originally
### written by Ariel Chernomoretz and modified by Jenny Drnevich to remove 
individual
### probes and/or entire probesets. Just highlight everything from here until
### you see STOP and paste it to R all at once

ResetEnvir<-function(cleancdf){
  cdfpackagename   <- paste(cleancdf,"cdf",sep="")
  probepackagename <- paste(cleancdf,"probe",sep="")
  ll<-search()
  cdfpackagepos <- grep(cdfpackagename,ll)
  if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
  ll<-search()
  probepackagepos <- grep(probepackagename,ll)
  if(length(probepackagepos)>0) detach(pos=probepackagepos)
  require(cdfpackagename,character.only=T)
  require(probepackagename,character.only=T)
  require(affy)
}

RemoveProbes<-function(listOutProbes=NULL,
                        listOutProbeSets=NULL,

cleancdf,destructive=TRUE){


  #default probe dataset values
  cdfpackagename   <- paste(cleancdf,"cdf",sep="")
  probepackagename <- paste(cleancdf,"probe",sep="")
  require(cdfpackagename,character.only = TRUE)
  require(probepackagename,character.only = TRUE)
  probe.env.orig <- get(probepackagename)


  if(!is.null(listOutProbes)){
   # taking probes out from CDF env
   probes<- unlist(lapply(listOutProbes,function(x){
                            a<-strsplit(x,"at")
                            aux1<-paste(a[[1]][1],"at",sep="")
                            aux2<-as.integer(a[[1]][2])
                            c(aux1,aux2)
                           }))
   n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
   n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
   probes<-data.frame(I(n1),n2)
   probes[,1]<-as.character(probes[,1])
   probes[,2]<-as.integer(probes[,2])
   pset<-unique(probes[,1])
   for(i in seq(along=pset)){
    ii  <-grep(pset[i],probes[,1])
    iout<-probes[ii,2]
    a<-get(pset[i],env=get(cdfpackagename))
    a<-a[-iout,]
    assign(pset[i],a,env=get(cdfpackagename))
   }
  }


  # taking probesets out from CDF env
  if(!is.null(listOutProbeSets)){
   rm(list=listOutProbeSets,envir=get(cdfpackagename))
  }


  # setting the PROBE env accordingly (idea from gcrma compute.affinities.R)
  tmp <- get("xy2i",paste("package:",cdfpackagename,sep=""))
  newAB   <- new("AffyBatch",cdfName=cleancdf)
  pmIndex <-  unlist(indexProbes(newAB,"pm"))
  subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex)
  rm(newAB)
  iNA     <- which(is.na(subIndex))


  if(length(iNA)>0){
   ipos<-grep(probepackagename,search())
   assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
  }
}

### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE THE TWO 
OBJECTS
### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls()

# All you need now is your affybatch object, and a character vector of 
probe set names
# and/or another vector of individual probes that you want to remove. If 
your affybatch
# object is called 'rawdata' and the vector of probesets is 'maskedprobes', 
follow
# these steps:

cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE)

# Make sure you are starting with the original cdf with all the probes and 
probesets.

ResetEnvir(cleancdf)

# Double-check to make sure all probesets are present in your affybatch by 
typing in
# the name of your affybatch and looking at the output.

rawdata

# To remove some probe sets (but not individual probes in this example), use:
RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes, cleancdf)

# The cdf file will be temporarily modified to mask the indicated probesets 
& probes,
# which you can check by typing in the name of your affybatch again and 
seeing that
# the number of probesets have decreased. The masking can be undone by 
using ResetEnvir
# as above, or by quitting the session. However, any Expression Set objects 
created
# when the cdf is modified will have the masked probesets removed 
permanently because
# they do not refer to the cdf like an affybatch object does.


Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu



More information about the Bioconductor mailing list