[Rd] Bug in agrep computing edit distance?

Dickison, Daniel ddickison at carnegielearning.com
Wed Nov 17 16:49:24 CET 2010

I posted this yesterday to r-help and Ben Bolker suggested reposting it

Dickison, Daniel <ddickison <at> carnegielearning.com> writes:

> The documentation for agrep says it uses the Levenshtein edit distance,
> but it seems to get this wrong in certain cases when there is a
> combination of deletions and substitutions.  For example:
> > agrep("abcd", "abcxyz", max.distance=1)
> [1] 1
> That should've been a no-match.  The edit distance between those strings
> is 3 (1 substitution, 2 deletions), but agrep matches with max.distance
> 1.
> I didn't find anything in the bug database, so I was wondering if somehow
> I'm misinterpreting how agrep works.  If not, should I file this in
> Bugzilla?

  Could you re-post this on r-devel?  It definitely sounds like
this is worth following up.  Based on a little bit of playing around,
it's quite clear that I don't understand what's going on.  The examples
show things like


 which makes sense, but




  all give "1 2 3 4" ??

  this makes it clear that I really don't understand what's going on
based on the documentation.  I tried to trace into the C code
(which calls functions from the TRE regexp library) but that didn't
help much ...

Daniel  Dickison
Research Programmer
ddickison at carnegielearning.com
Toll Free: (888) 851-7094 x103
FAX: (412) 690-2444

Revolutionary Math Curricula. Revolutionary Results.

Carnegie Learning, Inc. | 437 Grant St. 20th Floor | Pittsburgh, PA 15219

More information about the R-devel mailing list