[Rd] regex to match word boundaries

Martin Maechler maechler at stat.math.ethz.ch
Thu Dec 2 08:49:02 CET 2004


>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at myway.com>
>>>>>     on Wed,  1 Dec 2004 21:05:59 -0500 (EST) writes:

    Gabor> Can someone verify whether or not this is a bug.

    Gabor> When I substitute all occurrence of "\\B" with "X" R
    Gabor> seems to correctly place an X at all non-word
    Gabor> boundaries (whether or not I specify perl) but "\\b"
    Gabor> does not seem to act on all complement positions:

    >> gsub("\\b", "X", "abc def") # nothing done
    Gabor> [1] "abc def"
    >> gsub("\\B", "X", "abc def") # as expected, I think
    Gabor> [1] "aXbXc dXeXf"
    >> gsub("\\b", "X", "abc def", perl = TRUE) # not as
    >> expected
    Gabor> [1] "abc Xdef"
    >> gsub("\\B", "X", "abc def", perl = TRUE) # as expected
    Gabor> [1] "aXbXc dXeXf"
    >> R.version.string # Windows 2000
    Gabor> [1] "R version 2.0.1, 2004-11-27"

I agree this looks "unfortunate".

Just to confirm: 
1) I get the same on a Linux version
2) the real perl does behave differently and as
   you (and I) would have expected:

 $ echo 'abc def'| perl -pe 's/\b/X/g'
 XabcX XdefX
 $ echo 'abc def'| perl -pe 's/\B/X/g'
 aXbXc dXeXf


Also, from what I see, "\b" should behave the same independently
of perl = TRUE or FALSE.

--
Martin



More information about the R-devel mailing list