[Rd] invert argument in grep

Fri Nov 10 13:04:44 CET 2006

On 11/10/2006 6:28 AM, Prof Brian Ripley wrote:
> On Fri, 10 Nov 2006, Duncan Murdoch wrote:
> 
>> On 11/9/2006 5:14 AM, Romain Francois wrote:
>>> Hello,
>>>
>>> What about an `invert` argument in grep, to return elements that are
>>> *not* matching a regular expression :
>>>
>>> R> grep("pink", colors(), invert = TRUE, value = TRUE)
>>>
>>> would essentially return the same as :
>>>
>>> R> colors() [ - grep("pink", colors()) ]
> 
> Note that grep("pat", x, value = TRUE) is not the same as x[grep("pat", x)],
> as the help page carefully points out.  (I think it would be better 
> if it were.)
> 
>>> I'm attaching the files that I modified (against today's tarball) for
>>> that purpose.
> 
> (BTW, sending whole files makes it difficult to see the changes and even 
> harder to merge them; please use diffs.  From a quick look the changes 
> were very incomplete, as the internal functions were changed and there 
> were no changed C files.)
> 
>> I think a more generally useful change would be to be able to return a
>> logical vector with TRUE for a match and FALSE for a non-match, so a
>> simple !grep(...) does the inversion.  (This is motivated by the recent
>> R-help discussion of the fact that x[-selection] doesn't always invert
>> the selection when it's a vector of indices.)
> 
> I don't think that is pertinent here, as the indices are always a vector 
> of positive integers.  

The issue is that the vector might be empty, in which case 
arithmetically negating it has no effect.  Negating a vector of integer 
indices is not a good way to invert a selection, while logical negation 
of a logical vector is fine.

> 
>> A way to do that without expanding the argument list would be to allow
>>
>> value="logical"
>>
>> as well as value=TRUE and value=FALSE.
>>
>> This would make boolean operations easy, e.g.
>>
>> colors()[grep("dark", colors(), value="logical")
>>       & !grep("blue", colors(), value="logical")]
>>
>> to select the colors that contain "dark" but not "blue". (In this case
>> the RE to select that subset is rather simple because "dark" always
>> precedes "blue", but if that wasn't true, it would be a lot messier.)
> 
> That might be worthwhile, but it is relatively simple to change positive 
> integer indices to logical ones and v.v.
> 
> My personal take is that having 'value=TRUE' was already a complication 
> not worth having, and implementing it at C level was an efficiency tweak 
> not worth the maintenance effort (and also means that '[' methods are not 
> dispatched).

This makes it sound as though it would be worthwhile to redo the 
implementation of value=TRUE as something equivalent to x[grep("pat", 
x)] by putting this case into the R code.  This would simplify the C 
code and make the interface a little less quirky.  (I'm not sure how 
much code this would break because of the loss of coercion to character.)

The value="logical" implementation could also be done in R, not C.

The advantage of putting it into grep() rather than leaving it for the 
user to change later is that grep() has a copy of x in hand, so a user 
of grep() will not have to save length(x) to use in the conversion to 
logical.

Duncan Murdoch