[BioC] advice on Biostrings

Rafael A Irizarry ririzarr at jhsph.edu
Tue Feb 21 22:19:20 CET 2006


hi im using biostrings to count base content as well as pair of bases 
content. im using the following sniped of code:


###pmseq is a vector of character strings (not of the same nchar).
tmp <- sapply(pmseq,function(x){
  y = DNAString(x)
  c(alphabetFrequency(y)[2:5], ##count A,T,G,C
    length(matchDNAPattern("GC",y))+length(matchDNAPattern("CG",y))) 
##count GC or CG
})

it is painfully slow. strsplit and grep were much faster for the first 
part (counting bases) but the using grep for the second part was not 
straight forward.

any suggestions?

-r



More information about the Bioconductor mailing list