[BioC] a problem of trimLRPatterns still confused me

Wang Peter wng.peter at gmail.com
Fri Nov 30 21:36:27 CET 2012


thank you very much, Harris,you helped me again

now i understand, see the below

max.mismatchs <- 0.2*1:nchar(Rpattern)
subject = "GGTAACTTTTCTGACACCTCCTGCTTAAAACCCCAAAGGTCAGAAGGATCGTGAGGCCCCGCTTTCACGGTCTGTATTCGTACTGAAAATCAAGATCAAG"

Rpattern = "AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG"

sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) {
        s = substr(subject, j, nchar(subject))
        p = substr(Rpattern, 1, nchar(subject)-j+1)
        neditEndingAt(ending.at=nchar(s), pattern = p, subject = s,
with.indels=TRUE)
})

all distance
[1] 32 33 33 32 31 32 31 30 29 28 27 26 27 26 25 25 24 23 22 22 21 20 20 20
[25] 20 19 18 17 18 17 17 18 17 16 15 16 15 14 13 12 12 11 10  9  8  7  6  6
[49]  6  6  6  5  4  3  (2)  3  3  3  3  3  2  1  0  1

max.mismatchs
[1]  0.2  0.4  0.6  0.8  1.0  1.2  1.4  1.6  1.8  (2.0)  2.2  2.4  2.6
 2.8  3.0  3.2  3.4  3.6  3.8
[20]  4.0  4.2  4.4  4.6  4.8  5.0  5.2  5.4  5.6  5.8  6.0  6.2  6.4
6.6  6.8  7.0  7.2  7.4  7.6
[39]  7.8  8.0  8.2  8.4  8.6  8.8  9.0  9.2  9.4  9.6  9.8 10.0 10.2
10.4 10.6 10.8 11.0 11.2 11.4
[58] 11.6 11.8 12.0 12.2 12.4 12.6 12.8

when the function find a distance < = the corresponding mismatch. see
(2) and (2.0), the function stops.

but i think the distance between those 10bp kmer should be 4, not 2

CAAGATC     AAG
    AGATCGGAAG



More information about the Bioconductor mailing list