[BioC] a question about trimLRPatterns?

Harris A. Jaffee hj at jhu.edu
Thu Jan 19 22:20:16 CET 2012


To quote from ?trimLRPatterns, for Lpattern here,

          Once the integer vector is constructed using the rules given
          above, when 'with.Lindels' is 'FALSE', 'max.Lmismatch[i]' is
          the number of acceptable mismatches (errors) between the
          suffix 'substring(Lpattern, nLp - i + 1, nLp)' of 'Lpattern'
          and the first 'i' letters of 'subject'.  When 'with.Lindels'
          is 'TRUE', 'max.Lmismatch[i]' represents the allowed "edit
          distance" between that suffix of 'Lpattern' and 'subject',
          starting at position '1' of 'subject' (as in 'matchPattern'
          and 'isMatchingStartingAt').

          For a given element 's' of the 'subject', the initial segment
          (prefix) 'substring(s, 1, j)' of 's' is trimmed if 'j' is the
          largest 'i' for which there is an acceptable match, if any.

If you are asking about implementation, the sub-patterns, i.e suffixes of
Lpattern or prefixes of Rpattern, are tested "longest first" using the
the relevant max.mismatch vector "from the top, down". (Intuitively, you
should think of your max.mismatch vectors as being monotone increasing,
perhaps not strictly.)  The testing process at the relevant side of the
subject stops if/when an acceptable match is seen.  The See Also refers to
?`lowlevel-matching`, where you will find which.isMatchingStartingAt() and
which.isMatchingEndingAt().  These functions are called with
auto.reduce.pattern=TRUE, which allows a single "pattern" and single "at"
value to be passed in the context of a *vector* "max.mismatch" value, the
actual pattern being tested getting iteratively shorter by 1 character as
necessary, for each element of the subject, automatically.

Let me know if I didn't get at your question.

On Jan 19, 2012, at 3:15 PM, wang peter wrote:

> hello all:
> 
> i want to know how this function process data?
> 
> for left match
> it is taken as a "rate" and is converted to
> max.Lmismatch=as.integer(1:nLp *rate )
> then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp)
> of Lpattern and the first i letters of subject.
> dees i start from 1 or nLp? and the corresponding allowed mismatch is
> max.Lmismatch[i]?
> 
> for the right match
> it is taken as a "rate" and is converted to
> max.Rmismatch=as.integer(1:nRp * rate)
> then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp)
> of subject and the first i letters of Rpattern.
> dees i start from 1 or nRp? and the corresponding allowed mismatch is
> max.Rmismatch[i]?
> 
> -- 
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list