[R] regex -> negate a word

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Jan 20 16:14:41 CET 2009


Prof Brian Ripley wrote:
> On Mon, 19 Jan 2009, Rolf Turner wrote:
>
>>
>> On 19/01/2009, at 10:44 AM, Gabor Grothendieck wrote:
>>
>>> Well, that's why it was only provided when you insisted.  This is
>>> not what regexp's are good at.
>>>
>>> On Sun, Jan 18, 2009 at 4:35 PM, Rau, Roland <Rau at demogr.mpg.de> wrote:
>>>> Thanks! (I have to admit, though, that I expected something simple)
>>
>> It may not be what regexp's are good at, but the grep command in
>> unix/linux
>> does what is required *very* simply via the ``-v'' flag.  I
>> conjecture that
>> it would not be difficult to add an argument with similar impact to the
>> grep() function in R.
>
> Indeed.  I have often wondered why grep() returned indices, when a
> logical vector would seem more natural in R (and !grep(...) would have
> been all that was needed).
>
> Looking at the code I see it does in fact compute a logical vector,
> just not return it.  So adding 'invert' (the long-form of -v is
> --invert) is a job of a very few lines and I have done so for 2.9.0.
>

in fact, it's simpler than that.  instead of redundantly distributing
the fix over four different lines in character.c, it's enough to ^= the
logical vector of matched/unmatched flags in just one place, on-the-fly,
close to the end of the loop over the vector of input strings.  see
attached patch.

for consistency, you might want to
- name the internal invert flag 'invert_opt' instead of 'invert';
- apply the same fix to agrep.

it's also trivial to add another argument to grep, say 'logical', which
will cause grep to return a logical vector of the same length as the
input strings vector.  see the attached patch.  note: i am novice to r
internals, and i get some mystical warnings i haven't decoded yet while
using the extended grep, but otherwise the code compiles well and grep
works as intended; you'd need to fix the cause of the warnings.

if you want the 'logical' argument, you need to decide how it interacts
with 'values'.  in the patch, 'values' set to TRUE resets 'logical' to
FALSE, with a warning.

further suggestions:  the arguments 'values' and 'logical' could be
replaced with one argument, say 'output', which would take a value from
{'indices', 'values', 'logical'}.  it might make further extensions
easier to implement and maintain.

attached are patches to character.c, names.c, and grep.R; if you tell me
which other files need a patch to get rid of the warnigns (see below),
i'll make one. 

s = c("abc", "bcd", "cde")

grep("b", s)
# 1 2

grep("b", s, value=TRUE)
# "abc" "bcd"

grep("b", s, logical=TRUE)
# TRUE TRUE FALSE

s[grep("b", s, logical=TRUE)]
# "abc" "bcd"
# Warning: stack imbalance in 'grep', 9 then 10
# Warning: stack imbalance in '.Internal', 8 then 9
# Warning: stack imbalance in '{', 6 then 7

grep("b", s, invert=TRUE)
# 3

grep("b", s, invert=TRUE, value=TRUE)
# "cde"

s[!grep("b", s, logical)]
# "cde"
# Warning: stack imbalance in 'grep', 15 then 16
# Warning: stack imbalance in '.Internal', 14 then 15
# Warning: stack imbalance in '{', 12 then 13
# Warning: stack imbalance in '!', 6 then 7
# Warning: stack imbalance in '[', 2 then 3



vQ


More information about the R-help mailing list