[BioC] a problem of trimLRPatterns still confused me

Harris A. Jaffee hj at jhu.edu
Sat Dec 1 18:44:16 CET 2012


On Dec 1, 2012, at 12:00 PM, Wang Peter wrote:
> dear Harris
> thank you so much for your kindly explanation
> i am so ashamed to disturb u again.
> 
> my understanding is
> 
> when they low-level function to caculate the distance between S and P

In this situation, the C function _nedit_for_Proffset() is called, but
the purpose is much more than to calculate the edit distance between S
and P.  As I quoted before from ?`lowlevel-matching`, it is to determine
the minimum edit distance between P and all the suffixes S' of S.

> S= CAAGATC     AAG
> P=     AGATCGGAAG
> 
> 
> it will try
> CAAGATCAAG
> AAGATCAAG
> AGATCAAG
> GATCAAG
> ...
> G
> 
> and get all the edit distance
> but 2 is the smallest one

Yes, 2 is the minimum described above, occurring for the 8-letter suffix
S' = AAGATCAAG of S = CAAAGATCAAG.

> so it will take 2 as the distance between S and P

Not the distance between S and P, which you correctly observed in a previous
post was 4, but the distance between the entire pattern P and some suffix S'
of S, unknown to trimLRPatterns.

> S'= AGATC     AAG
> P= AGATCGGAAG
> 
> and then trim the whole S, rather than S'

The whole S is taken by trimLRPatterns as its best guess at S'.  In this
case, a little more than necessary is trimmed, perhaps in other cases, a
little less than necessary.

> -- 
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253



More information about the Bioconductor mailing list