[BioC] Question on PROcess package
Farida Mostajabi
f0most01 at louisville.edu
Thu Dec 10 21:36:15 CET 2009
Hi Wade,
Thank you for the code.My question on the code is:
On the part, you create empty matrix, the matrix dimension is chosen based on the dimension of the first spectrum.
ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1])
ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ]
bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1])
what if the dimention of other spectrum are different, which is the case for our problem? On the next part of the program, when it fills the matrix elements with baseline corrected values, I receive this error
"number of items to replace is not a multiple of replacement length"
How did you approach this issue?
Thanks,
Farida
Hi Farida,
I used to use the PROcess package extensively, but I haven't much for the past 2 years. I ran into the same problem that you did, so I wrote a modified version of the rmBaseline function that fixes that, and does some other things that you may find handy later on. The parts that should interest you most are the highcut and lowcut options.
The normed and rawout options are not used that often.
I last ran this code under R 2.6.1, so I am not sure if it will work without a few tweaks.
Good luck,
Wade
rmBaseline2<-function(fldr, bseoffrda = NULL, breaks = 200, qntl = 0, method = "loess",
lowcut=0, highcut=195000, bw = 0.1, rawout=FALSE,normed=FALSE,
SpecNames = list.files(fldr, pattern = "\\.*csv\\.*"))
{
##################################################
## modified BATCH function for baseline subtraction
##################################################
# Modified version of rmBaseline function in PROcess package.
# This version allows you to specify the mass range to consider
# for baseline removal via the inputs lowcut and highcut.
# This was written to accounts for minor differences in the spectra length
# due to the laser firing for slightly different lengths of time.
#
# Use rawout=TRUE if you want all of the spectra read in and
# stored in a matrix without actually baseline subtracting.
# (This is useful for taking advantage of plotting routines
# that were originally written for spectra after they had been baseline subtracted.)
#
# The use of normed=T is more rare. It was written as part of an exploratory analysis
# I did to see if it made a difference if you normalized, then baseline subtracted
# rather than the traditional process of baseline subtracting and then normalizing
# My analysis showed that it made no appreciable difference, so I am sticking
# with the status quo.
SpecNames.abbrev<-unlist(strsplit(SpecNames,split = " [0-9]{3} "))[seq(2,2*length(SpecNames),2)]
if(normed==FALSE){
fs <- SpecNames
n <- length(fs)
#peek at dimensions to create empty matrix
ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1])
ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ]
bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1])
for (j in 1:n) {
f1 <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[j])
fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ]
if(rawout==FALSE){bseoffM[,j] <- bslnoff(fcut, breaks = breaks, qntl = qntl,
method = method, bw = bw)[,2] }
if(rawout==TRUE){bseoffM[,j]<-fcut[,2]}
if (j==1){rownames(bseoffM) <- signif(bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw)[,1],6)
}
}
colnames(bseoffM) <- SpecNames
}
if(normed==TRUE){
fs <- fldr
n <- ncol(fs)
for (j in 1:n) {
f1 <- cbind(as.numeric(rownames(fs)),fs[,j])
fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ]
bseoff <- bslnoff(fcut, breaks = breaks, qntl = qntl,
method = method, bw = bw)
if (j > 1)
bseoffM <- cbind(bseoffM, bseoff[, 2])
else bseoffM <- bseoff[, 2]
}
dimnames(bseoffM) <- list(signif(bseoff[, 1], 6), SpecNames=colnames(fldr))
}
if (!is.null(bseoffrda))
save(list = bseoffM, file = bseoffrda)
bseoffM
}
##EXAMPLE
# rmBaseline2(fldr=seldipath(basedir="W:\\Master6\\Raw Specta",chiptype="IMAC",inten="high")
# ,breaks = 2
# ,qntl = 0
# ,method = "approx"
# ,bw = 0.1,
# highcut=50000
# )
-----Original Message-----
From: Farida Mostajabi [mailto:f0most01 at louisville.edu]
Sent: Monday, November 30, 2009 2:48 PM
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Question on PROcess package
To whom it may concern,
I am a student from University of Louisville, USA. I am currently doing some MALDI-TOF MS data analysis research
with PROcess package.
I am trying to use the batch functionality of the package to do pre processing on 286 spectra. The m/z values
are not exactly the same throughout the spectra, which I think it is an assumption in PROcess package.
I used the code below to do baseline correction for one spectrum at a time
B.fs <- list.files(my.B.files, pattern = "\\.*csv\\.*", full.names = TRUE)
nb.file <- length(B.fs)
foo<-lapply(seq(nb.file), function(i) read.files(B.fs[i] ))
f0<-lapply(seq(nb.file), function(i) foo[[i]][foo[[i]][,1]>0,])
basecorr<-lapply(seq(nb.file), function(i) bslnoff(f0[[i]], method = "loess", bw = 0.1))
I could not use "rmBaseline" function since the row-names of the returning matrix are the m/z values, which in my case, are not identical.
Would you please give some suggestions on this issue?
Best Regards,
Farida
More information about the Bioconductor
mailing list