[BioC] Biostrings - vcountPattern optimization
Erik Wright
eswright at wisc.edu
Thu Jul 22 17:54:28 CEST 2010
Hello,
Lately I have been working on counting sequence fragments in larger sets of sequences. I am searching for thousands of fragments of 30 to 130 bases in hundreds of thousands of sequences between 1200 and 1600 bases. Currently I am using the following method to count the number of "hits":
#### start ####
library(Biostrings)
fragments <- DNAStringSet(c("ACTG","AAAA"))
sequence_set <- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
for (i in 1:length(fragments)) {
counts <- vcountPattern(fragments[[i]],
sequence_set,
max.mismatch=1)
hits <- length(which(counts > 0))
print(hits)
}
#### end ####
This method is taking a long time to complete, so I am wondering if I am doing this in the most efficient manner? Does anyone have a suggestion for how I can accomplish the same task more efficiently?
Thanks!,
Erik
> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.16.0 IRanges_1.6.0
loaded via a namespace (and not attached):
[1] Biobase_2.8.0
More information about the Bioconductor
mailing list