[R] Bug in agrep computing edit distance?

Dickison, Daniel ddickison at carnegielearning.com
Wed Nov 17 00:47:06 CET 2010


The documentation for agrep says it uses the Levenshtein edit distance,
but it seems to get this wrong in certain cases when there is a
combination of deletions and substitutions.  For example:

> agrep("abcd", "abcxyz", max.distance=1)
[1] 1


That should've been a no-match.  The edit distance between those strings
is 3 (1 substitution, 2 deletions), but agrep matches with max.distance >=
1.

I didn't find anything in the bug database, so I was wondering if somehow
I'm misinterpreting how agrep works.  If not, should I file this in
Bugzilla?

> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.12.0




Daniel  Dickison
Research Programmer
ddickison at carnegielearning.com
Toll Free: (888) 851-7094 x103
FAX: (412) 690-2444

Revolutionary Math Curricula. Revolutionary Results.

Carnegie Learning, Inc. | 437 Grant St. 20th Floor | Pittsburgh, PA 15219
www.carnegielearning.com



More information about the R-help mailing list