[BioC] a question about trimLRPatterns?

Hervé Pagès hpages at fhcrc.org
Tue Oct 30 19:55:43 CET 2012


Hi there,

On 10/30/2012 09:58 AM, wang peter wrote:
> i want to know how this function works?
>
> for example:
> trimLRPatterns(Rpattern = Rpattern, subject = subject,
> max.Rmismatch=1,with.Lindels=TRUE)
>
>
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern =              "GAATAGTACTGTAGGCACCATCAATAGATCGGAA"
>
> the function will try to calculate the distance by such coding:
>
> sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) {
>          s = substr(subject, j, nchar(subject))
>          p = substr(Rpattern, 1, nchar(subject)-j+1)
>          neditEndingAt(ending.at=nchar(s), pattern = p, subject = s,
> with.indels=TRUE)
> })
> [1]  0  2  4  6  8 10 12 14 15 14 13 12 11 10  9  9  8  7  8  7  6  5
> 6  6  5  4  4  4  3  2  1  0
> [33]  1  1
> when the function find the value which is first satisfy the
> max.Rmismatch value, it will stop
> in this case,they function will stop at the first position.
>
> IF
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern =              "GAATAGTACTGTAGGCACCATCAATAGATCGGTT"
> The results
> [1]  2  3  4  6  8 10 12 14 15 14 13 12 11 10  9  9  8  7  8  7  6  5
> 6  6  5  4  4  4  3  2  1  0
> [33]  1  1
> it will stop
> in this case,they function will stop at
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern =
> "GAATAGTACTGTAGGCACCATCAATAGATCGGTT"
>
>
> so the shortcoming is the trimLRPatterns cannot find the shared
> sequence between subject and Rpattern
> "GAATAGTACTGTAGGCACCATCAATAGATCGG"

trimLRPatterns is about trimming the subject by finding/removing
the largest possible *prefix* and/or *suffix* in the subject that
looks like the left and right pattern, respectively. It's not a
tool for finding/removing the longest common substring between
the subject and pattern.

Note that, in your case, you would get the result I believe you're
looking for by just using max.Rmismatch=2 instead of max.Rmismatch=1.

Cheers,
H.

>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list