[R] regex -> negate a word

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Mon Jan 19 10:28:23 CET 2009


Stavros Macrakis wrote:
> On Sun, Jan 18, 2009 at 2:22 PM, Wacek Kusnierczyk
> <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>   
>
>> x[-grep("abc", x)]
>> which unfortunately fails if none of the strings in x matches the pattern, i.e., grep returns integer(0);
>>     
>
> Yes.
>
>   
>> arguably, x[integer(0)] should rather return all elements of x
>>     
>
> The meaning of x[V] (for an integer subscript vector V) is: 

what about numeric vectors?  r performs smart downcasting here:

x[1.1]
# same as x[1]

x[0.3]
# character(0)

> ignore 0
> entries, and then:
>   

what if V=NULL? 

> a) if !(all(V>0) | all(V<0) ) => ERROR
>   

there is no error for x[v] with V=0, V=as.numeric(NA), or V=NaN.

> b) if all (V>0): length(x[V]) == length(V)
>   


unfortunately, false if v contains a non-integer (so it goes beyond your
discussion, but may cause problems in practice):

x[c(1, 0.5)]
# one item (if x is non-empty)

> c) if all (V<0): length(x[V]) == length(x)-length(unique(V))
>   

not true for cases like V=c(-1, -1.5), which again go beyond your
discussion, but may happen in practice.

interestingly, unique(c(NA, NA)) is just NA, rather than c(NA,NA).  i'd
think that if we have two non-available values, we can't be sure they're
in fact equal, but unique apparently is.  (you'd have to tell it not to
be with incomparables=NA.)

> When length(V)==0, the preconditions are true for both (b) and (c), so
>   

interestingly, all(V>0) & all(V<0) is TRUE for V=c().

> the R design has made the decision that length(x[V]) == 0 in this
> case.  If you're going to have the "negative indices means exclusion"
> trick, this seems like a reasonable convention.
>   

i didn't say this was unreasonable, just that x[integer(0)] should,
arguably, return x.  'empty index' is not as precise an expression to be
sure that it will be obvious to everyone that integer(0) is *not* an
empty index, and less so with NULL.  what is meant, i guess, is 'empty
index expression', i.e., no index rather than empty index, and i'd
humbly suggest (risking being charged with boring pedantry) to improve tfm.


vQ




More information about the R-help mailing list