[BioC] what is the best way to get scores for matches from matchPWM() ?

Lucas Carey lucas.carey at gmail.com
Wed Jan 20 17:40:24 CET 2010


Hi All,
I'm wondering what is the best way to get the score for every match
from matchPWM() in Biostrings

Right now, to score all matches to pwm in genome I do this:

#Find PWM hits for fwd & reverse complement of PWM for all chromosomes in genome
mmf <- sapply(1:Nchr,
function(chr){matchPWM(pwm,genome[[chr]],min.score=cutoff) }  )
mmr <- sapply(1:Nchr,
function(chr){matchPWM(reverseComplement(pwm),genome[[chr]],min.score=cutoff)
}  )
mmm <- c(mmf,mmr)

#Extract the sequences. RevComp where necessary.
Sequences <-  c( rapply(mmf,as.character,how='unlist'),
sapply(rapply(mmr,as.character,how='unlist'),function(x){c2s(rev(comp(s2c(x))))})
)

#convert to DNAStringSet for in order to score. This is quite slow
lcl_set  <- DNAStringSet(as.character(Sequences))
Scores  <- sapply(lcl_set,PWMscoreStartingAt,pwm=pwm)

This is incredibly inefficient. What is the best way to do this?

thanks

-Lucas



More information about the Bioconductor mailing list