[Rd] Bug in agrep computing edit distance?

Dickison, Daniel ddickison at carnegielearning.com
Thu Nov 18 16:56:57 CET 2010


A followup to this.  I got R to compile, and the following patch seems to
fix this issue (I don't think my previous attachment worked so it's pasted
inline).

There is still a quirk, where tail insertions seem to cost 1 extra and I'm
not sure why.  In the first example below, 3 and 5 should match, and in
the second, 5 should match, but they don't unless max.distance=3:

> agrep("x", c("x", "y", "ax1", "abx", "x12", "ax12", "abx1"),
>max.distance=2)
[1] 1 2 4
> agrep("ax1", c("x", "y", "ax1", "abx", "x12", "ax12", "abx1"),
>max.distance=2)
[1] 1 3 4 6 7


In any case, I think this is more in line with the documentation.  I'm
very new to hacking on R so please let me know if this isn't the right way
to submit patches...

Daniel


Index: src/library/base/R/grep.R
===================================================================
--- src/library/base/R/grep.R (revision 53625)
+++ src/library/base/R/grep.R (working copy)
@@ -93,6 +93,11 @@

     n <- nchar(pattern, "c")
     if(is.na(n)) stop("invalid multibyte string for 'pattern'")
+
+    ## make pattern match the whole string
+    pattern <- gsub("\\", "\\\\", pattern, fixed=TRUE)
+    pattern <- paste("^", pattern, "$", sep="")
+
     if(!is.list(max.distance)) {
         if(!is.numeric(max.distance) || (max.distance < 0))
             stop("'max.distance' must be non-negative")
Index: src/main/agrep.c
===================================================================
--- src/main/agrep.c (revision 53625)
+++ src/main/agrep.c (working copy)
@@ -42,7 +42,7 @@
     regex_t reg;
     regaparams_t params;
     regamatch_t match;
-    int rc, cflags = REG_NOSUB | REG_LITERAL;
+    int rc, cflags = REG_NOSUB;

     checkArity(op, args);
     pat = CAR(args); args = CDR(args);



Daniel  Dickison
Research Programmer
ddickison at carnegielearning.com
Toll Free: (888) 851-7094 x103
FAX: (412) 690-2444

Revolutionary Math Curricula. Revolutionary Results.

Carnegie Learning, Inc. | 437 Grant St. 20th Floor | Pittsburgh, PA 15219
www.carnegielearning.com



More information about the R-devel mailing list