[R] regex -> negate a word

Stavros Macrakis macrakis at alum.mit.edu
Sun Jan 18 23:32:40 CET 2009


On Sun, Jan 18, 2009 at 2:22 PM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>> x <- c("abcdef", "defabc", "qwerty")
>> ...[find] all elements where the word 'abc' does not appear (i.e. 3 in this case of 'x').

> x[-grep("abc", x)]
> which unfortunately fails if none of the strings in x matches the pattern, i.e., grep returns integer(0);

Yes.

> arguably, x[integer(0)] should rather return all elements of x

The meaning of x[V] (for an integer subscript vector V) is: ignore 0
entries, and then:

a) if !(all(V>0) | all(V<0) ) => ERROR
b) if all (V>0): length(x[V]) == length(V)
c) if all (V<0): length(x[V]) == length(x)-length(unique(V))

When length(V)==0, the preconditions are true for both (b) and (c), so
the R design has made the decision that length(x[V]) == 0 in this
case.  If you're going to have the "negative indices means exclusion"
trick, this seems like a reasonable convention.

Of course, that means that you can't in general use x[-V] (where
all(V>0)) to mean "all elements that are not in V".  However, there is
a workaround if you have an upper bound on length(x):

       x[ c(-2^30, -V) ]

This guarantees at least one negative number.

           -s




More information about the R-help mailing list