[BioC] Question on PROcess package

Thu Dec 10 21:36:15 CET 2009

Hi Wade, 

Thank you for the code.My question on the code is:

On the part, you create empty matrix, the matrix dimension is chosen  based on the dimension of the first spectrum.  

        ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1])
        ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ]
        bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1])

 what if the dimention of other spectrum are different, which is the case for our problem? On the next part of the program, when it fills  the matrix elements with baseline corrected values, I receive this error

"number of items to replace is not a multiple of replacement length"

How did you approach this issue?

Thanks, 

Farida

Hi Farida,
I used to use the PROcess package extensively, but I haven't much for the past 2 years. I ran into the same problem that you did, so I wrote a modified version of the rmBaseline function that fixes that, and does some other things that you may find handy later on. The parts that should interest you most are the highcut and lowcut options. 

The normed and rawout options are not used that often.

I last ran this code under R 2.6.1, so I am not sure if it will work without a few tweaks.

Good luck,
Wade

 rmBaseline2<-function(fldr, bseoffrda = NULL, breaks = 200, qntl = 0, method = "loess", 
                        lowcut=0, highcut=195000,   bw = 0.1, rawout=FALSE,normed=FALSE,
                        SpecNames = list.files(fldr, pattern = "\\.*csv\\.*")) 
{

##################################################
## modified BATCH function for baseline subtraction
################################################## 

 # Modified version of rmBaseline function in PROcess package.
 # This version allows you to specify the mass range to consider 
 # for baseline removal via the inputs lowcut and highcut.
 # This was written to accounts for minor differences in the spectra length
 # due to the laser firing for slightly different lengths of time.
 #
 # Use rawout=TRUE if you want all of the spectra read in and
 # stored in a matrix without actually baseline subtracting.
 # (This is useful for taking advantage of plotting routines 
 # that were originally written for spectra after they had been baseline subtracted.)
 #
 # The use of normed=T is more rare. It was written as part of an exploratory analysis
 # I did to see if it made a difference if you normalized, then baseline subtracted
 # rather than the traditional process of baseline subtracting and then normalizing
 # My analysis showed that it made no appreciable difference, so I am sticking
 # with the status quo.

 SpecNames.abbrev<-unlist(strsplit(SpecNames,split = " [0-9]{3} "))[seq(2,2*length(SpecNames),2)]

  if(normed==FALSE){
        fs <- SpecNames
        n <- length(fs)
        #peek at dimensions to create empty matrix
        ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1])
        ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ]
        bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1])

            for (j in 1:n) {
                f1 <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[j])
                fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ]
                    if(rawout==FALSE){bseoffM[,j] <- bslnoff(fcut, breaks = breaks, qntl = qntl, 
                    method = method, bw = bw)[,2]                                        }
                    if(rawout==TRUE){bseoffM[,j]<-fcut[,2]}
                    if (j==1){rownames(bseoffM) <- signif(bslnoff(fcut, breaks = breaks, qntl = qntl, 				method = method, bw = bw)[,1],6)
                    }
                            } 

        colnames(bseoffM) <- SpecNames                       
                    }

    if(normed==TRUE){
        fs <- fldr
        n <- ncol(fs)
            for (j in 1:n) {
                f1 <- cbind(as.numeric(rownames(fs)),fs[,j])
                fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ]
                    bseoff <- bslnoff(fcut, breaks = breaks, qntl = qntl, 
                    method = method, bw = bw)
                    if (j > 1) 
                        bseoffM <- cbind(bseoffM, bseoff[, 2])
                    else bseoffM <- bseoff[, 2]
                            } 
        dimnames(bseoffM) <- list(signif(bseoff[, 1], 6), SpecNames=colnames(fldr))                               
                    }   

        if (!is.null(bseoffrda)) 
            save(list = bseoffM, file = bseoffrda)
        bseoffM 

}

 ##EXAMPLE
 #   rmBaseline2(fldr=seldipath(basedir="W:\\Master6\\Raw Specta",chiptype="IMAC",inten="high")
 #   ,breaks = 2 
 #   ,qntl = 0 
 #   ,method = "approx" 
 #   ,bw = 0.1,
 #   highcut=50000
 #   )

-----Original Message-----
From: Farida Mostajabi [mailto:f0most01 at louisville.edu] 
Sent: Monday, November 30, 2009 2:48 PM
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Question on PROcess package

To whom it may concern,  

I am a student from University of Louisville, USA. I am currently doing some MALDI-TOF  MS data analysis research
with PROcess package. 

I am trying to use the batch functionality of the package to do pre processing  on 286 spectra. The m/z values 
are not exactly the same throughout the spectra,   which I think it is an assumption in PROcess package.

I used the code below to do  baseline correction for one spectrum  at a time 

B.fs <- list.files(my.B.files, pattern = "\\.*csv\\.*", full.names = TRUE)
nb.file <- length(B.fs)

foo<-lapply(seq(nb.file), function(i) read.files(B.fs[i] ))
f0<-lapply(seq(nb.file), function(i) foo[[i]][foo[[i]][,1]>0,])
basecorr<-lapply(seq(nb.file), function(i)  bslnoff(f0[[i]], method = "loess", bw = 0.1))

I could not use "rmBaseline" function since the row-names of the returning matrix are  the m/z values,  which in my case, are not  identical. 

Would you please give some suggestions on this issue? 

Best Regards, 

Farida