[BioC] fast iterator over DNAString's?

Paul Shannon pshannon at systemsbiology.org
Thu Mar 11 01:30:40 CET 2010


I wish to trim a variable length sequence from the end of many thousands of DNAStrings in a DNAStringSet.  

The sequence to be trimmed is any recognizable chunk of a solexa short read adapter, which ends up on the end of, for example, 22nt miRNAs.  The adapter chunk might be found in the middle of a 35 base read, or it might be closer to the end.  In every case, I want to delete every base from the start of the adapter chunk to the end of the read.

I imagine there might be a BString operation equivalent to sed.  See could be used ike this:

  echo 'CGAAGCGGGATGATCTATCTCGTATGCCGTCTTCT' | sed s/TCGTATGCCGTC.*$//      --> GAAGCGGGATGATCTATC

(where TCGTATGCCGTC is only part of the 21-base adapter, but is probably a long enough portion to be representative)

Any way to do this with BStrings and friends?

Thanks!

 - Paul


More information about the Bioconductor mailing list